This paper will explore the pricing model of ‘gacha games’, a genre of predominantly mobile games who’s primary monetization method is the sale of ‘gacha’ draws. These draws allow players to trade for a random in-game item, with associated rarities corresponding to their probabilities. something, something… lotteries, something, something… technically not gambling, something, something… ethics?
2 Introduction
Whilst ‘Gacha’ is generally understood to be a form of gambling, this is not strictly true. Both gacha and gambling involve exchanging money for the probability of wining a ‘prize’, the key difference is in the payoff: gambling wins you money (more often than not a negative amount); but gacha wins you in-game items. Being lucky when gambling gives you the means to gamble more: you increase the money you have meaning you have more money to place on the next bet. Whereas being lucky in gacha gives you the means to gacha less.
3 Literature Review
I hope no one has done this before.
4 Theory
Gacha is a series of lotteries: each pull has a set of possible outcomes, and each outcome has a probability of being drawn. Let us call each pull \(L_k\), where \(k\in \mathbb{N}_{++}\) is the number of the pull. Let the set of outcomes (i.e. the in-game items) be a fixed, finite set \(X=[x_1, x_2, x_3, ..., x_n]\). Each outcome \(x_i\in X\) has a probability of being drawn \(p_i\). The discrete probability distribution over the outcomes is given by \(P(x)\). To illustrate, consider the following example:
Imagine that you want a new sword for your character, so let us assume the items are all swords. Next, assume swords come in 3 different rarities: common (call it a Short Sword), which is drawn at a probability of 90%; rare (call it an Uchigatana), which is drawn with a probability of 9%; and legendary (call it the Sword of Night), which is drawn with a probability of 1%.
Clearly, the likelihood of obtaining the rarest, and often best, items is very low, thus players generally need to pull many times before obtaining the Sword of Night which they desire. Furthermore, pulling more times only increases the probability of obtaining the Sword of Night, it does not guarantee it. This unfortunate fact lead to the next section, which addresses this issue.
4.1 Pity
If you really wanted the Sword of Night, it would really suck if you pulled over and over, 100+ times, and still didn’t get it. Whilst this is still a possibility in some gacha games, nearly all implement a ‘Pity system’, which effectively sets the maximum limit on the number of unsuccessful pulls. This ‘hard pity’ is determined by the probability of the rarest items, the cost of a pull, and the desired profit margin of the game developers. A typical number for ‘hard pity’ is around 100 pulls, so for our example let us choose 100 exactly.
The so called ‘hard pity’ is not the only pity system in place, as the likelihood of the rarest items will increase the closer to the ‘hard pity’ you get. The number of pulls required to start the increase in probability is called the ‘soft pity’. This is generally around 70-80 pulls, so let us choose 75. By law, game developers must disclose the probabilities of each item, but they are not required to disclose the ‘soft pity’ numbers, nor the probabilities of the items after this point. Thus, let us assume some linear increase in probability, from 1% at 74 pulls to 100% at 100 pulls. Such a probability function, conditional on the pity \(k\), where the event \(x\) is success, would be:
Using this we can graph the probabilities (conditional on no successes) for the Sword of Night in Figure 1 below. Note that the x-axis is the number of pity (given by \(k\)), which will always be 1 less than the number of pulls (given by \(n\)).
Figure 1: Conditional Probability of Success, P(x|k)
The second rarest items generally also have a pity system, where you are guaranteed to receive at least 1 of them every 10 pulls or so. The number of pulls to be guaranteed a rare item is openly stated, and is usually 10. Further, this guarantee is inclusive of rarer items as well, meaning that every 10 pulls you are guaranteed at least 1 rare item, but it could be a legendary item. The probability that this rare item will be of legendary rarity is usually unchanged (1%), implying that the probability of a common item is transferred to the rare item (90%+9%=99%).
From the above distribution we can also calculate the expected number of pulls required to obtain the Sword of Night, first by finding the expected pity, and inferring that the expected number of pulls is 1 more than this. Call the number of pulls for a success \(\bar{n}\). The expected pity, conditional on success, is given by the sum of the probability of \(k\) being the first success, weighted by k:
\[
E(k+1|x) = 1 + \sum_{i=1}^{k} k P(k|x)
\tag{2}\]
Note that the probability of success \(x\) after \(k+1\) pulls is dependant the probability of \(k\) previous failures. Thus, the probability of success in pull \(k+1\) is given by:
In the case where the pity is zero, the probability of success is simply the conditional probability of success in the first pull, as the probability of zero pity is 1. All subsequent probabilities of success are conditional on the previous failures, and a thus defined slightly differently. In every case, as each pull is independent, we can deduce that \(P(k)=P(k-1)(1-P(x|k-1))\) for \(k>0\).
As the probability of success is 1% below 75 pulls, and follows equation (number) from 75 pulls upwards, we can input these into the probability equation:
Figure 2: Joint Probability of Success and Pity, P(x ∩ k)
The joint probability of success and pity can be seen to fall slowly, at a factor of 0.99, until the soft pity of 74, where it rises sharply, due to an increased (conditional) probability of success, and then falls back down due to a decreasing probability of reaching such high levels of pity. This decreasing probability is precisely due to the fact that the (conditional) probability of success is rising, and thus the (conditional) probability of failure is decreasing.
We can make some further inferences from this join probability distribution. The probability to obtain the Sword of Night in the first 74 pulls is 52.47%. This can be calculated in two ways, either by subtracting the probability of no successes \((0.99^{74})\) from 100%, or one can sum the joint probabilities from the above figure from 0 to 73 pity:
All but 1 term from each sum cancels each other out, only leaving \(0.99^{0}-0.99^{74}\), and hence 5.
From the joint probability it is a simple step to get to the probability of pity \(k\), conditional on success (thanks Bayes):
\[
P(k|x) = \frac{P(x\cap k)}{P(x)}
\]
But what is the probability of success \(P(x)\)? This probability can be difficult to find in practice, but luckily it’s rather simple: If you pull 100 times, you are guaranteed to succeed, thus \(P(x)=1\). I made a slightly sneaky assumption in the above calculations, and that is that the probability of success 100%, and I did this by assuming that one would pull until they succeeded. This reason for this, is that the domain for the pity should be complete, which requires enough pulls to happen to allow for all values of pity to be possible: if you only do 50 pulls then it’s impossible to have a pity above 50. More on this later.
Assuming still that we will keep pulling until success (and no more), we can calculate the expected number of pulls required to obtain the Sword of Night as:
\[
E(\bar{n}|x) = 1 + \sum_{i=0}^{k} k P(k|x) = 1 + \sum_{i=0}^{k} k P(x\cap k)
\tag{6}\]
These densities can be seen in Figure 3, the sum of which (plus 1) will yield our expected number of pulls for success. This value for our example is 55.26. This is the number of pulls you should expect to make in order to get the Sword of Night, and therefore the basis for the price of the item.
The mean does not tell us the entire story, as the distribution is by no means symmetric: it is heavily weighted above a pity of 74. Using Figure 2, the median can be seen to be 68, with 25% and 75% percentiles of 28 and 78 respectively, and the mode is also 78.
A brief aside on the number of pulls
In the previous section, I assumed that the number of pulls is sufficiently large to guarantee success. This sets the probability of success to 1, and allows for the full range of pity to be possible. However, I want to consider the case where this is not true; where the number of pulls may not be sufficient to guarantee success.
This can still be reflected, but requires truncation of the distributions depending on the number of pulls, and therefore a more complex probability of success. Instead of a unconditional probability of success, it will be conditional on the number of pulls, but just not conditional on the pity. This above probabilities will be the same, but also conditioned on the number of pulls, with the exception of when the pity is above the number of pulls, in which case the probability of success is 0. The conditional probability of success does not change, when also conditioning on n, the probability that is changing is the probability of reaching a pity of \(k\). Therefore, \(P(x|k, n)\) is the same as \(P(x|k)\), and given by 1. This implies that \(x\) is independent of \(n\).
The joint probability of success and pity, conditional on the number of pulls, is given by 7. This is the same as 3, but with the added condition that \(k<n\). As the probability of \(k\) being equal to or larger than the number of pulls \(n\) is 0, the joint probability of success and pity is 0 for \(k\geq n\):
This can be seen in Figure 4, where the x-axis is still the pity, like in Figure 2, but one can use the slider to see the effect of changing the number of pulls.
Figure 4: Joint Probability of Success and Pity, P(x ∩ k|n)
Code
viewof n = Inputs.range( [1,100], {step:1,value:100,label:'Number of pulls',id:'n'})
One may notice that if \(k\ge n\), then the cumulative joint probability, conditional on \(n\), no longer adds to 1. This number is the probability of success, conditional on n, and is given by the sum over the joint probabilities, conditional on \(n\), for \(k<n\):
\[
P(x|n) = \sum_{i=0}^{n-1} P(x\cap i|n)
\]
Note that this is increasing with \(n\), reflecting that the probability of success increases the more pulls that you do, until it reaches 1 at hard pity. From this we can find the expected number of pulls to get a success \(\bar{n}\), conditional on only \(n\) pulls:
\[
E(\bar{n}|x, n) = 1 + \sum_{i=0}^{k} k P(k|x, n) = 1 + \sum_{i=0}^{k} k \frac{P(x\cap k|n)}{P(x|n)} = 1 + \frac{1}{\sum_{j=0}^{n-1} P(x\cap j|n)}\sum_{i=0}^{k} k P(x\cap k|n)
\]
These joint densities can be seen in Figure 5, where the x-axis is still the pity, like in Figure 3, but one can use same the slider above to see the effect of changing the number of pulls.
The distribution in unchanged from Figure 3 when \(n=100\), but as \(n\) falls the rest of the distribution grows. This happens because as the probability of success decreases with fewer pulls, the densities are scaled up.
To show the effect of the number of pulls on the expected number of pulls to get a single success, see the figure below.
As we can see, much of the expected number of pulls for a success is driven by the probability of success (conditional on the number of pulls), which is increasing with the number of pulls. Each interval, one before and one after soft pity, is also concave, whilst also always being less than a \(45^{\circ}\) line.
4.2 Sparks
Not all gacha systems include a pity system, and instead rely on a ‘spark’ system. This system is similar to the pity system, in the fact that it restricts the maximum number of pulls you need to do before acquiring the desired item. The difference is that the ‘spark’ system does not manipulate probabilities, but rather guarantees the choice of the item after a certain number of pulls. A common number for this is 200 pulls, and if the (pre-chosen) desired item has not been obtained by this point, one can choose to exchange ‘sparks’ for it. A single spark is obtained for every failed pull.
This system is more simple to calculated expected values for the number of pulls, as the probability of success is a simple geometric distribution. The system under pity is also geometrically distributed, but the probability of success is not constant, and is dependent on the number of pulls, whereas the probability of success in the spark system is constant.
\[
P(n=\bar{n}) = (1-p)^{\bar{n}-1} p
\]
The expected number of pulls to obtain the desired item is given by 8:
This expected value is based on the assumption that one will do 200 pulls (if necessary), so success is guaranteed. Unsurprisingly, with the same probability of success, the expected number of pulls is larger than the pity system. I have also simulated the spark system for 100000 pulls (p=0.01 and number of sparks for legendary is 200), and the distribution of the number of pulls can be seen in Figure 7. The statistics from these simulations, and the predicted amounts, can be seen in the table below.
Note, there are many observations above the spark threshold of 200, which means that it took more than 200 pulls to get a legendary item naturally, but one can assume that the player would choose the legendary item after 200 pulls, and would only continue to pull if they want further copies, or if a new limited-time item is released (in which case the spark system would reset, but their previous pull count would not). Therefore, we can view these observations as cases where the player would select a number of legendary items through the spark system and then obtains a legendary item through chance after the observed pulls.
4.3 50/50
Apologies, I should have included a trigger warning here for the gacha gamers reading this. The dreaded ‘50/50’ is something that has turned great excitement into extreme disappointment and anger. The reason for why is that, in most gacha systems, when you get a legendary item, you’re not always guaranteed that it’s the legendary item you want. In fact it can be legendary item you really don’t want.
Until now, i’ve assumed a single item of the highest rarity (in fact for any rarity). However, usually there is a set of legendary items, of which 1 will be chosen. This set will include a single limited-time item: normally more powerful, better looking, more fun; and also a subset of ‘standard’ items. When you obtain a legendary item, there is a 50% probability that it is the limited-time item, but also a 50% probability that it is any one of the ‘standard’ items. This is known as the 50/50.
Typically, losing the 50% chance of the legendary item of choice, will guarantee it for the next legendary item obtained, limiting the number of times one can lose this 50/50. Some systems will have variations where there is more than one limited-time item and the 50/50 is split across them, with a separate probability determining whether one obtains a limited legendary item or a ‘standard’ legendary item. For example, assume that there are 2 limited-time legendary items; on pulling a legendary, there is a random draw to determine whether ths legendary will be one of the 2 limited-time items or a standard item, and say these probabilities are 75% to 25%; then, if a limited-time item is chosen, there is a 50% chance that it will be either of the 2 limited-time items; if a standard item is chosen, the probabilities are also split uniformly across however many standard items are in the set.
Knowing that your chances of getting the limited-time item is not certain, even when you get a legendary item, will change the expected number of pulls, but it’s not obviously clear by how much it would increase this number. Instinct may tell us that the expected number of pulls will be half way between winning on the first legendary item, and losing the 50/50 and being guaranteed it on the next legendary. The expected value will, by definition, be between these two values, but the expectation is not the sum of the expectations of 2 legendary pull sequences. This is because the latter sequence is conditional on the first sequence, as 50% of the time (on average) you win on the first legendary, thus erasing the need to keep pulling.
Let us continue with the example used previously, but add a 50/50 mechanic. The range of pulls needed to the a single limited-time legendary, is now between 1 pull (if you’re lucky) and 200 pulls (if you’re really unlucky). Again, assume for simplicity that one will do enough pulls to guarantee the desired item, such that \(p(x)=1\). We can then calculate the expected number of pulls needed to get the limited-time legendary using the same conditional probability distribution in Figure 1, but changing the probability of observing each pity number to reflect the 50/50. The key difference here is that some of the possible values for the number of pulls for a success have (broadly) 2 different paths to them. For example, a successful pull of the limited-time legendary after 10 pulls can be achieved by either winning the 50/50 on the 10th pull, or losing the 50/50 before this, but getting another legendary on the 10th pull. These are the 2 broadly different paths, but actually there is 10 different possibilities here: winning the 50/50 on the 10th pull is one, but there are 9 different ways to lose the 50/50 before the 10th pull, and then get the limited-time legendary on the 10th pull. Therefore, the probability of getting the limited-time legendary on the 10th pull is the sum of the probabilities of these 10 different paths. For this example, the probability is:
Note that I have used the probability of success of 0.01 as we are well before the soft pity. If we use the general terms for the probability conditional on the pity \(p(x|k)\), we would have:
\[
p(\bar{n}=10) = \underbrace{[\Pi_{i=0}^{8}(1-p(x|k=i))]}_{\text{1st 9 are losses}} \underbrace{p(x|k=9)}_{\text{1 win in 10}} \underbrace{(0.5)}_{\text{win 50/50}} + \text{Pr(8 losses and 2 wins)} \underbrace{(0.5)}_{\text{lose 50/50}}
\]
This term I have not calculated is more complex as it needs to contain the multiple different ways of achieving \(n-2\) losses and 2 wins, specifically with the second win at \(n\). To calculate this, see the below table of probabilities. Each row represents when the first win occurs, and then the probability of this sequence happening.
First Win
Probability
\(m\)
\(Pr(\text{m-1 losses})\)
\(Pr(\text{Win in pull m})\)
\(Pr(\text{Loss until pull 9})\)
\(Pr(\text{Win in pull 10})\)
1
1
\(\times\)
\(p(x|k=0)\)
\(\times\)
\([\Pi_{i=0}^{7}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=8)\)
2
\((1-p(x|k=0))\)
\(\times\)
\(p(x|k=1)\)
\(\times\)
\([\Pi_{i=0}^{6}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=7)\)
3
\([\Pi_{i=0}^{1}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=2)\)
\(\times\)
\([\Pi_{i=0}^{5}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=6)\)
4
\([\Pi_{i=0}^{2}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=3)\)
\(\times\)
\([\Pi_{i=0}^{4}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=5)\)
5
\([\Pi_{i=0}^{3}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=4)\)
\(\times\)
\([\Pi_{i=0}^{3}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=4)\)
6
\([\Pi_{i=0}^{4}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=5)\)
\(\times\)
\([\Pi_{i=0}^{2}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=3)\)
7
\([\Pi_{i=0}^{5}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=6)\)
\(\times\)
\([\Pi_{i=0}^{1}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=2)\)
8
\([\Pi_{i=0}^{6}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=7)\)
\(\times\)
\((1-p(x|k=0))\)
\(\times\)
\(p(x|k=1)\)
9
\([\Pi_{i=0}^{7}(1-p(x|k=i))]\)
\(\times\)
\(p(x|k=8)\)
\(\times\)
1
\(\times\)
\(p(x|k=0)\)
This table does show a pattern, but as the 1st and last terms contain a degenerate probability of 1, the generalisation needs to not include these in the larger summation. Therefore, the general formula contains 2 main terms: the combination the first ans last terms, which are the same due to the symmetry here; and the total summation of all the terms in between. Note, that this only works for \(n\le4\), such that the second summation term contains any terms. I have also changed the notation of the conditional probabilities such that the pity value it is conditioned on is in the subscript, for example \(p(x|k=0)\) is now \(p_0\).
Somehow my simplifications make this look more complex. Surely it simplifies… right? One way we could simplify the calculation of this is to separate the equation before n=75 (i.e. soft-pity plus 1) and after. Before soft-pity, all marginal probabilities are the same \((p(x|k)=p=0.01)\), so the equation simplifies to:
This equation is not valid for values of \(n\) below 4, so we can only use it above this. I will therefore state the formulas for n=1, 2, and 3 here:
The next range to look at is if \(76\le n\le148\), as this is the range where 2 successful pulls could include one from after soft pity but could also avoid soft pity. For example, one could lose the 50-50 on pull 74, but win on the very next pull (75), or one could have to wait another 74 pulls to win (148). The key difference between this range and the previous range below 74, is that it is possible to pass soft pity. This will complicate the probabilities, but only in that the probabilities are not the same for each pull, therefore the logical process is the same. Suppose one gets success after 100 pulls, similar to the table above, knowing that there must be 1 50-50 loss, there are 99 different paths to this point, each defined by where this 50-50 loss occurs. If this 50-50 loss occurred anywhere between (and including) pull 26 and 74, then we need not worry about soft pity and all of these cases will not require it. Outside these bounds then one of the legendary wins occurred after soft pity, and therefore used a higher conditional probability of success.
For any \(n\in[76, 148]\), we need not use the post-soft pity probabilities if the pull of first win \(m\in[n-74,74]\).
We can use 9 generally for all value of \(n\le4\), but it may be easier to use a constant \(p\) for any pity value less than 74, and separate the probabilities likewise.
The last range to assess is that for \(n\ge149\). This is the range where it is guaranteed that at least one of the legendary wins will be after soft pity. At \(n=149\), this will be one win after 74 and the other after 75, whereas for every value of n above this both pulls will be after soft pity.
The blue series shows outcomes where the 50/50 is won, therefore reflects Figure 1. The orange series shows outcomes where the 50/50 is lost previously and the guaranteed win occurs on this pull. This series is influenced by the conditional probability distribution, in that the probability has peaks after the soft pity and after 2 times the soft pity, as these are the most likely places for wins to occur.
A clear difference between the series’ is in the first pulls until soft pity, where the blue series is falling, as higher number of pulls needed implies more failures, whereas the orange series is rising, as this requires two wins and the number of possible ways of doing this increases in the number of pulls needed. This is also why the orange series starts falling from the peak after the first soft pity, until the second soft pity. This is because of the hard pity, which starts to reduce the number of combinations of pulls needed to get 2 wins. For example, compare obtaining a guaranteed legendary (after losing the 50/50) after 100 pulls to after 140 pulls: there are 99 different ways to win twice in 100 pulls, where the second win is at 100; but there are only 61 different ways to win twice in 140 pulls, where the second win is at 130, as the first win must be after 40 pulls but before 100 pulls (including 40 & 90).
Descriptive statistics of the above distribution can be found below in Table 2. The table includes these statistics for winning the 50/50, losing the 50/50, and the total probability of success. THe total probability is the probability actually faced, and what determines the price.
Table 2: Descriptive Statistics for the 50/50 System
Mean Pull No.
Modal Pull No.
25% Percentile Pull No.
Median Pull No.
75% Percentile Pull No.
50/50 Win
55.26
79
29
69
79
50/50 Loss
110.52
159
85
110
149
Combined
82.89
79
52
80
110
One can see that the main effect of the 50/50 system is to increase the expected number of pulls needed to obtain the desired item. Every single statistic in Table 2 is higher with the 50/50 (the mode is the same), increasing the mean by 49.99999999999997%. I have not rounded this number up to 50% yet, as it is suspiciously close to 50% that this may not be coincidence. In the next paragraph I will check this (will likely put this in an appendix).
On the face of it, this result does make sense: half the time, it will take twice as long to obtained the desired item. Therefore, on average, the number of pulls needed should increase by 50%. Comparing the mean pull number for 50/50 win and loss, it takes twice as many pulls on average to get the desired item if you lose the 50/50 compared to winning it. If you compare the un-rounded numbers, the exact factor is slightly less than 2, the question is whether this is due to rounding error in previous steps, or if this is supported mathematically.
Firstly, if there were no pity system (i.e. a constant probability of success), the probability for a single success follows a geometric distribution, and the expected number of pulls needed to get 1 success is \(\frac{1}{p}\). If we consider the 50/50 system, if you lose the 50/50 then the probability distribution were are concerned with is a negative binomial distribution, as we need to know how many trials are needed for 2 successes. The mean of this distribution is \(\frac{2}{p}\), exactly twice as many pulls as needed for a single success. Therefore, the question to be answered here is whether the pity system changes this result, and if so, by how much.
Before diving into the slightly daunting maths, first let’s try to replicate the result with different probabilities. For example, reduce the conditional probability of success, before soft pity, (call it \(\underline{p}\)) from 1% to 0.5%, or increase it to 2%. The table below shows the results for these 2 cases. the rows labelled ‘50/50 % inc’ show the percentage increase in the mean number of pulls needed to get the desired item, compared to the mean number of pulls needed to get the desired item without the 50/50 system.
Table 3: Descriptive Statistics for the 50/50 System, Hard Pity=89, Soft Pity=74 & 64, \(\underline{p}=0.5\%\), \(\underline{p}=1\%\) & \(\underline{p}=2\%\)
Parameters
Pull Number
Soft Pity
\(\underline{p}\)
Mean
Modal
25% Percentile
Median
75% Percentile
74
0.5%
50/50 Win
66.11
79
58
78
81
50/50 Loss
132.21
159
106
150
159
Combined
99.16
79
77
84
150
50/50 % inc
50.0
0.0
32.7586
7.6923
85.1852
1%
50/50 Win
55.26
79
29
69
79
50/50 Loss
110.52
159
85
110
149
Combined
82.89
79
52
80
110
50/50 % inc
50.0
0.0
79.3103
15.942
39.2405
2%
50/50 Win
40.06
79
15
35
69
50/50 Loss
80.13
85
49
82
106
Combined
60.1
79
26
58
84
50/50 % inc
50.0
0.0
73.3333
65.7143
21.7391
Soft Pity
\(\underline{p}\)
Mean
Modal
25% Percentile
Median
75% Percentile
64
0.5%
50/50 Win
60.01
70
58
69
72
50/50 Loss
120.02
141
98
135
142
Combined
90.02
70
68
76
135
50/50 % inc
50.0
0.0
17.2414
10.1449
87.5
1%
50/50 Win
51.08
70
29
66
71
50/50 Loss
102.16
140
79
103
136
Combined
76.62
70
52
72
103
50/50 % inc
50.0
0.0
79.3103
9.0909
45.0704
2%
50/50 Win
38.1
69
15
35
66
50/50 Loss
76.21
77
49
77
100
Combined
57.16
70
26
58
78
50/50 % inc
50.0
1.4493
73.3333
65.7143
18.1818
From Table 3, the mean number of pulls needed to get the desired item increases by 49.99999999999994%. Whilst this is not the exact same increase as before, it is very close, and close enough to suggest that this is not a mere coincidence. One can also see that the change in the soft pity has no effect on this either, therefore suppose the soft pity is zero: the conditional probability distribution is linear from 1 to \(N\). In this case we can use \(\underline{p}\) as the very first probability, when pity is zero, and then conditional probabilities will rise from there to equal 1 at the hard pity of \(N-1\).
The conditional probability would be:
\[
P(x|k) = \underline{p} + \frac{1-\underline{p}}{N-1} k
\]
The conditional and joint probabilities are shown in Figure 9.
One can see that the initial probability does not have a comparatively large effect as the increases in the probability are linear and therefore larger than the initial probability. This is due to the maximum number of pulls being 90, and if we increased this the changes in probability would be smaller, and the joint probability distribution would be overall flatter.
We could try to compare this with the 50/50 system, but this is also a bit of a mess, so let’s think about this another way… Let \(F_r \sim H(r, p)\) be the number of failures before the \(r^{th}\) success in a Bernoulli process with probability of success \(p\). Also, let \(p\) be dependant on the pity (number of failures before seeing success). And let \(Y_i \sim G(p)\) where \(Y_i\) represents the number of pulls before the \(i^{th}\) success. Then \(Y_i\) also represents the number of pulls between the \(i-1^{th}\) and \(i^{th}\) success. Thus, we can think of \(F_r\) as the \(r^{th}\) sum of \(Y_i\):
\[
F_r = Y_1 + Y_2 + \cdots + Y_r
\]
In the case here of a 50/50 with a guarantee after a 50/50 loss, then is one losses the 50/50 \(r=2\), and if you win the 50/50 \(r=1\). Due to this sum, it is easier to find the expected value as it is just the sum of these variables:
As \(Y_1\) and \(Y_2\) follow the same distribution, then the expected number of pulls before 2 successes is twice as many as the expected number of pulls before 1 success. This is the same as under a negative binomial distribution, in which the probabilities are constant.
Importantly, I did not specify the conditional probability distribution here (which is \(G\)), therefore this applies to any valid distribution, including those above with varying starting probabilities and levels of soft pity and hard pity.
5 Optimal Pricing
Typically, as economists, when face with uncertain outcomes, we use the expected values to inform us of what the price should be. In the case of gacha, we don’t want to focus on the expected value of a single pull, but rather the expected number of pulls one would need to obtain the desired items. This is easy as i’ve already calculated this.
What is a key question is whether probabilistic pricing can be an optimal pricing strategy, and under what conditions. The first model shown here will have the pity system (both soft and hard), but not contain a 50/50. The second model will introduce the 50/50 back in. The third model (if I can be bothered) will contain the spark system.
Preliminary analysis shows that a probabilistic pricing strategy is a form of price discrimination (perhaps even perfect). Compare gacha to a fixed price system: with a fixed price, those that are willing to pay this price will do so, and anyone who is not willing to pay this price, will not purchase the item. With a gacha system, those that are willing to pay the expected cost will still do so, but those that are not willing to pay the expected price, may still be willing to pay something fora chance at getting the item. This is how a gacha system can capture a larger proportion of the consumer budgets (this ignores possible risk aversion but the logic is sound).
In usual tradition, to find profit we need both costs and revenues. Game development costs are typically upfront and fixed, and subsequent costs after release are things like server costs and maintenance. These variable costs are likely difficult to parse out, but for modelling purposes let us assume some fixed marginal cost per successful purchase. We could add a smaller cost per pull also for the marginal server sots used to generate the pull, but this would be an aside.
The revenue clearly depends on the number of pulls, but also on the price of each pull, the probabilities of success, and the value of the items. For optimal pricing we want to compare a gacha system to a fixed price system, where this fixed price is equal to the cost of the expected number of pulls for success under the gacha system.
I will assume that the firm is a monopolist, as they are the only provider of a game, of which the contents are sufficiently differentiated. Gacha games do have competition from similar games (both gacha and non-gacha), and this will affect the price that they can set, but (i think) the effect will be equal for either price system, so can be ignored.
To calculate the revenue under the gacha system, we need to categorise the player-base depending on their willingness to pay. Call the set of active players \(A\), with size \(\bar{A}\), where player \(a\) has valuations for the current available item denoted as \(v_a\). The valuations are in terms of the number of pulls they are willing to do for one of the item.
Players’ willingness to pay is dependant on their ability to pay, so denote each player’s budget as:
\[
b_a = z_a + w_a
\]
where \(w\) is the in game currency gained from playing the game, and \(z\) is the real money converted to in-game currency used to purchase pulls. To be consistent with the valuations above, both \(w\) and \(z\) are in terms of number of pulls. The in-game and real-money currency amounts can be inferred from this using appropriate exchange rates (i.e. by multiplying by the in-game price of a single pull). We can use these budgets as our categories: players with zero real money budget \((z_a=0)\) are known as free-to-play (FTP); players with a positive real money budget \((z_a>0)\) are known as pay-to-play (PTP). We can further categorise the PTP players into five categories: those with a budget less than the expected number of pulls for success \((z_a < E[Y])\); those with a budget larger than the expected number of pulls for success, but lower than the hard pity \((E[Y]\le z_a<Y_{max})\); those with a budget large enough to guarantee a success but less than two expected successes \((Y_{max}\le z_a < 2E[Y])\), those with a budget larger than two expected successes, but not large enough to guarantee two successes \((2E[Y]\le z_a < 2Y_{max})\); and those with a budget large enough to guarantee two success and more \((z_a > 2Y_{max})\). These categories are often labelled using aquatic animals, therefore refer to Table 4 for this information.
Table 4: Budget Categories for Players
Category
Budget
Description
FTP
\(z_a=0\)
Free-to-play
Minnow
\(0 < z_a < E[Y]\)
Small budget, less than expected number of pulls for success
Dolphin 1
\(E[Y] \le z_a < Y_{max}\)
Budget large enough to expect a success but not guarantee a success
Dolphin 2
\(Y_{max} \le z_a < 2E[Y]\)
Budget large enough to guarantee a success but less than two expected successes
Whale
\(2E[Y] \le z_a < 2Y_{max}\)
Budget larger than two expected successes, but not large enough to guarantee two successes
Giga-Whale
\(2Y_{max} \le z_a\)
Budget large enough to guarantee two success and more
The reason for categorizing the players like this, is because their budgets wll affect the number of pulls they can do, and therefore affecting the probability of success \(p(x|N)\). It can also show us which players cannot afford the item if it had a fixed price: if the fixed price is equal to the expected number of pulls for success \(E[Y]\), then players cannot afford the item if \(b_a < E[Y]\). For example, \(w_a=\frac{1}{2}E[Y]\) then players need at least \(z_a=\frac{1}{2}E[Y]\) to afford the item. Under the Gacha system, players with a budget lower than the expected value of the item still have a chance to obtain the item, but under a fixed price system they do not.
We can summarise the relevant variables for each player \(a\) as \((v_a, w_a, z_a)\). The revenue for the firm is then the sum of the revenue from each player, which means that we need to specify distributions for these variables. As the players’ values of the items are subjective and unknown to the firm, the simplest assumption we can make is that they are uniformly distributed between 0 and hard pity (\(N=100\)). The budget variables are observable to the firm, but the observation is not realised until purchase. As the firm must set the price before the release of the game, they must make assumptions about the distribution of these variables.
5.1 Real Money Budget
The distribution of real money budget must not be uniform, as there is a large proportion of players for which \(z_a=0\) and the density of players should decrease as the budget increases. Therefore, assume that the distribution of real money budget is a Pareto Type II distribution (Lomax) with scale parameter \(\lambda\) and shape parameter \(\alpha\). This distribution is defined as:
where \(z \ge 0\). The proportion of F2P players is \(f(0) = \frac{\alpha}{\lambda}\), so if we wish to set the proportion of F2P players as a quarter of the player-base, then we should set \(\alpha = \frac{\lambda}{4}\). As I have not data to inform me of appropriate values, i’ll make an ‘educated’ guess and set \(\lambda=8\) and \(\alpha=2\). Perhaps I can use (Gaillard et al. 2023) to inform me of better values, but they do use a different functional form of the Pareto distribution. Using these values, the density function is:
As we want the distibution to be between \([0, 100]\), we need to normalise this distribution, by dividing by \(F(100)\). We will denote this normalised distribution as \(Lomax_{100}(\lambda, \alpha)\).
This gives us enough informatinon to summise the distribution for the in-game budget, but for ease let us assume that the players’ valuations are independant of their real-money budgets (maybe not realistic but definitely much easier to deal with).
5.2 In-game Currency
The simplest assumption is that the in-game budget is uniformly distributed between some minimum and maximum values for the in-game currency \([w_{min}, w_{max}]\), but this would not likely be appropriate as the distribution does depend on decisions made by the players. We can assume that the firm knows this minimum and maximum in-game currency, and for simplicity assume the players know this too (in reality they don’t but can likely make a reasonable estimate). Assume that earning in-game currency has an fixed marginal cost of \(c\) per pull (likely actually increasing but oh well), which is effectively the combined costs of effort and time (opportunity cost), minus any benefits as earning in-game currency often means playing the game, which is (hopefully) fun. This effectively reduces the valuation of the item by adding a cost based on how much in-game currency you need to earn. This is what gives the incentive to use real-currency as is is, in a sense, cheaper, as it does not reduces your valuation.
Let player \(a\) have a valuation of the item of \(v_a\), and a real-money budget of \(z_a\), then the decision they need to make is how much in-game currency \(w_a\) they want to earn. The optimal choice here is to earn no more than how much they value the item, above their existing real-money budget, which implies (assuming no discounting or assuming it is accounted for in the valuations):
I am going to assume the linear case for simplicity from now on. Also, it should be noted that the above equation should be within the possible bounds for \(w\):
Figure 10 shows this in-game currency budget as a function of the real money budget, and one can move the valuation to show the in-game currency budget increase with it. The top plot shows the negative relationship between the two budgets, and the bounds of the in-game currency budget. The density of the distribution at the determined value of \(v_a\) is shown in the bottom plot. (I might make a 3d plot of this)
Figure 10: Distributions of In-Game Currency Budget
The above graph basically just shows that there is very little difference between the linear and convex cases, except in the magnitude of the cost. The convex case will increase the cost to a level where the maximum in-game currency won’t be reached. Whilst the convex case may be more realisitic, the linear case is much easier to deal with. So, for simplicity, I will use the linear case going forward.
5.3 Distribution of Variables
I will start with these 2 more simple distributions, as they are convenient, but it may be more accurate for the valuations to follow distributions similar to the Pareto distribution, or some truncated normal distribution. To summarise:
Further, these distributions are all for continuous random variables, whereas the measurement scale for the variables is the number of pulls, which is discrete. As the continuous distributions are easier to deal with, I will overlook this as the variables represent conversions from real-money into a discrete number of pulls, so we can think of them as continuous, and understand that a value of 15.8 pulls, for example, is equivalent to \(15.8 \times \text{price of a single pull}\) in real money. One should also note that a player could have a non-integer valuation but the realisiation of this (in terms of number of pulls desired) would need to be rounded down to the nearest integer.
To picture this, these distributions are shown in Figure 11. The distribution for the in-game currency is descriptive of the real distribution, which is not well defined but whos shape was found through simulation. The actual distribution found through 10000 simulations is shown in the appendix.
Figure 11: Distributions of Values and Budget Variables
One can see from this paramaterisation, that the median real money budget is 3.3 pulls whilst the median in-game currency budget is 41 pulls. I set the median valuation as 50 pulls, which means that the median player would not value the item more than the in-game currency that they could earn.
Below is have combined the two budgets into the full budget. This was found by simulation again and then stylised in the below diagram. The valuations and cost of earning in-game currency are also shown.
Figure 12: Distribution of full budget and valuation
Figure 12 shows two spikes in density around both the minimum in-game currency and the maximum in-game currency. The first spike can be characterised as the proportion of players that have a real-money budget at least as large as their valuation of the item, and therefore have no need for any further in-game currency above the minimum. The second spike can be characterised as the proportion of players who, not only have a real-money budget less than their valuation of the item, but sufficiently less such that they need at least the maximum available in-game currency to be willing to pay their valuation of the item.
After each of these spikes, the density fall in a convex manner, but at varying rates. After the first sprike, the density falls slower and converges to a value close to the density of the valuations (rememebr that this is uniform). After the second spike, the density falls much faster and seems to converge towards zero (which is necessary as success is guaranteed after 100 pulls, therefore it makes no sense to have a budget larger than that).
5.4 Fixed Price
To illustrate a fixed price system, assume the price of the item is \(q\) pulls. This would mean that an mean of \((1-q)\%\) of the consumers would value the item more than the price, and \(q\%\) would value it less than the price. Therefore, the firms revenue would be the sum of the difference between the price and the in-game currency budget for the players that value the item more than the price, excluding those who are not willing to pay real money to make up the difference:
The higher the firm sets the price, the more money they make per player, but fewr players will be willing to pay. Iff they lower teh price below the maximum -ingame currency then they will be sell to some players effectively for free, but this will increase the number of players willing to pay. If the proportion of players that have the maximum in-game currency is high then the firm will never want to set the price below this amount.
Given the discontinuous distributions, finding a closed form solution to the price is difficult. What one can do is take the previous simulations done, and calculate the revenue for each player at different prices to find a maximum. The result of this is a profit maximising price of 66 pulls, which averages a revenue of 0.73 pulls per player.
5.5 Gacha Price - Consumer decision making
This may be the actual approach I need to take to get a profit function, as just looking at the distribution will not get me there. This is also as the distributions don’t tell us about the acual pairs of budgets and valuations each consumer has, which is the actually important thing when making the decision to pull.
The consumer’s decision will be based on how likely they think they are to win the item, how large their budget is, and how much they value the item. Using the equations found above we can construct the expected utility function for each player. First, define the utilites for success and failure:
where \(\Delta z_a\) and \(\Delta w_a\) are the real-money and in-game currency budgets spent on the pulls; \(u_z(\cdot)\) and \(u_w(\cdot)\) are the utilities of the remaining corresponding budgets; and \(c(w_a)\) is the cost of earning in-game currency. Here we assume that thte utility of the item is their valuation of it, and we are assuming that the in-game currency and real-money currency will have some utility function based on possible alternative expenditure (this will be made clearing and more explicit when future items are introduced). The expected utility for each player (where \(x\) is success) is then:
Here, the probability of success is conditional on the budget, and multiplying this the valuation gives the expected valuation of the item. This probability of success is:
The expected utility will then be this expected valuation plus the utility of the expected remaining budgets, minus the expected cost of earning in-game currency. \(E(\Delta z_a)\) and \(E(\Delta w_a)\) are the expected real-money and in-game currency budgets spent on the pulls, respectively, which will depend on the probability distribution for success.
It is important to state at some point that we will assume that in-game currency will be used before real-money currency, therefore \(E(\Delta w_a) \ge E(\Delta z_a)\) and \(\Delta w_a \ge \Delta z_a\). I will also state that the decision to earn in-game currency is based on their valuation of the item, not how much they expect to need as they can always earn more in-game currency if they want to.
The expected number of pulls to get 1 success \(\bar{n}\), conditional on a budget of \(b_a\) pulls (the densities of which have been shown previously in Figure 5), is:
where \(P(x \cap k|b_a)\) is given by 7. This expected number of pulls for 1 success determines the expected budget used on the pulls. As the in-game currency is used first we can state that:
What can be seen is that if the utility function is linear then both of these equation are the same, because if your utility is the summation of your existing budgets, then the breakdown of these doesn’t matter at all. However, if these utility functions are different, implying that the in-game currency budget and the real-money currency budget have different uses, then the breakdown does matter. In most? Gacha games the purchasable currency (which converts into in-game currency for pulls) can be used to purchase other secondary items, such as cosmetics, and there are oten purchasble items which do not use the in-game currency, such as battlepasses, so the assumption that the utility functions for each budget should be different is likely an accurate one. However, whether this is a significant thing is hard to tell and may lead to unneccesary complexity when I am only assessing the gacha system itself and not secondary revenue mthods. Therefore, let us ignore this assumption and use a utility function \(u\) that can be used for the entire budget, by treating both budgets as perfectly substitutable. Meaning that:
Before evaluating this, we should consider the utility of not partaking in the Gacha system at all, and thereby not spending any of the budget. This is needed as obtaining the item is not always guaranteed (depending on the size of the budget), and therefore the player may be better off not spending any of their budget at all. The utility of not partaking in the Gacha system is simply the utility of the budget, which is:
This mearly implies that the expected utility of earning the item would need to greater than the expected cost.
This condition is unlikely to be binding in the case where we only have a single item, as the budget has no other use other than Gacha, therefore the utility of the remaining budget will be zero. However, all Gacha game are ‘live service’ game, which means that they constantly adding new items, therefore the remaining budget will always have utility. Adding future items into this decision making process will be discussed in the next section. The solution for a case with no future items is to spend the entire budget as there is no utility for any remaining budget. We can therefore simplify the expected utility function to:
As the budget is already determined, as long as this expected utility function is positive, they will spend as much of it as possible until they either get the item or run out of currency.
From the perspective of the firm, Gacha ensures that nearly every player will engage in the Gacha system, as to not engage in it requires a very large cost of earning the in-game currency (assuming they will want it), and a really low valuation of the item. These customers that still don’t engage in the gacha system, still wouldn’t buy the item at a fixed price, as their valuations are low and, given that their in-game currecny budget is high, their real money budget is also low, so the revenue loss from these customers would likely be trivial.
In order to maximise profits, the firm can alter the probabilities. Note that the firm does not care about whether the player is successful or not, as the cost to them of this is negligible, but they actually care about how many pulls the player makes. Therefore, the firm will want to reduce the probability of success (conditional on the pity being less than soft pity) to increaase the expected number of pulls. This will have the effect of also reducing the player’s probability of success conditional on their budget, reducing their expected utility, thus making this a constriant. The probability of success can be increased by lowering both soft pity and hard pity, but may reduce some potential revenues.
We can define the conditional probability distribution with equations using 2 straight lines, with a discontinuity at the soft pity. Below soft pity, the conditional probability is flat at a rate of \(\underline{p}\), and above the soft pity the probability rises to 1 at hard pity. Thus:
This is the general form of 1. However, we want the version conditional on the budget, which is easy it essentially is making the probability zero above the budget:
\[
P(x|k, b_a) = \begin{cases}
\underline{p} & \text{if } k < k_{soft} \text{ \& } k < b_a \\
\underline{p} + \frac{(1-\underline{p})}{k_{hard}+1-k_{soft}}(k+1-k_{soft}) & \text{if } k_{soft} \le k \le k_{hard} \text{ \& } k + 1 < b_a \\
0 & \text{if } k \ge b_a
\end{cases}
\]
We can then use this to find the general form for the joint probability of success and pity, conditional on the budget, which will be needed to find the expected utility function:
\[
P(x \cap k|b_a) = \begin{cases}
\underline{p}(1-\underline{p})^k & \text{if } k < k_{soft} \text{ \& } k < b_a \\
[\underline{p} + \frac{(1-\underline{p})}{k_{hard}+1-k_{soft}}(k+1-k_{soft})](1-\underline{p})^{k_{soft}}\Pi^k_{i=k_{soft}}(\frac{k_{hard}+1-i}{k_{hard}+1-k_{soft}}) & \text{if } k_{soft} \le k \le k_{hard} \text{ \& } k < b_a \\
0 & \text{if } k \ge b_a
\end{cases}
\]
We can simplify the product at the end of the 2nd term with some evaluation, given that the denominator is a constant and the numerator is in the for \(A-x\) where \(x\) is an integer:
\[
P(x \cap k|b_a) = \begin{cases}
\underline{p}(1-\underline{p})^k & \text{if } k < k_{soft} \text{ \& } k < b_a \\
[\underline{p} + \frac{(1-\underline{p})}{k_{hard}+1-k_{soft}}(k+1-k_{soft})]\frac{(k_{hard}+1-k_{soft})!(1-\underline{p})^{k_{soft}}}{(k_{hard}+1-k)!(k_{hard}+1-k)^{k-k_{soft}+1}} & \text{if } k_{soft} \le k \le k_{hard} \text{ \& } k < b_a \\
0 & \text{if } k \ge b_a
\end{cases}
\]
To find the expected utility we need to sum this probability for all integer values of \(k\) from 0 to \(b_a-1\). To make this easier, start with the case where \(b_a \le k_{soft}\), which is the case where the player has a budget less than soft pity. In this case, we can just sum the first term:
This probability is equal to 1 - the probability of never winning, which is the probability of winning at least once, which is what the sum of all of these probabilities means. With this knowledge it becomes easier for the case where the budget is more than the soft pity, as we only have to subtract the probability of never winning (conditional on the budget) from 1.
At this point, I’ve realised I may need to take the Gacha system all the way to its simplest form, which means removing any pity system and therefore allowing for the distribution to have an infinite domain. I’ve made this decision as I keep getting rather intractable maths, which clearly has solutions (I think finite, and even single solutions), but none of them are particularly closed form. For example, a probability with 2 factorials in it is difficult to work with further.
If we define the conditional probability of success, with no pity system, as:
\[
P(x|k) = P(x) = \underline{p}
\]
Then the joint probability of success and pity, conditional on the budget, is:
Using the simplified expected utility function, with no future items and linear utility, we can find the expected utility function with no pity system:
Like before, this equation shows that the expected utility for the consumer is the valuation of the item multiplied by the probability of winning it (which is 1 minus the probability of never winning it), minus the cost of earning in-game currency. We can view this as a participation constraint, used by the firm at balance the number of players willing to engage in the gacha system against minimising the probability of winning.
Given that the set of players is \(A\), with a size of \(\alpha\), we can denote the set of players willing to engage in the gacha system as \(A_{Gacha}\), with a size of \(\alpha_{Gacha}\). This set contains all players which satisfy the participation constraint, and therefore have a positive expected utility. As the probability of success increases, this set will increase, thereby increasing the number of consumers, but reducing the expected revenue per consumer, and vice versa.
Recall that the cost function for in-game currency is assumed proportional to the amount of in-game currency earned, at a rate of \(c\in (0, 1)\), and that the optimal level of in-game currency is given by:
Given that \(v_a\) and \(z_a\) have known distributions, the firm will set \(\underline{p}\) to maximise the expected revenue. The expected revenue per player is the expected number of pulls to get exactly one success, which, under the simpler geometric distribution, is given by:
\[
E(\bar{n}|x) = \sum_{n=1}^{\infty} n \underline{p}(1-\underline{p})^{n-1}
\]
This is very similar and basically is just adjusted upwards as the distribution is zero beyond the budget. After substituting in for the budget, we can then conclude that the expected revenue per player is:
Like I have already said, lowering the probability of success will increase the expected revenue per consumer, but will reduce the number of consumers willing to engage in the gacha system. To try to analyse this, we will look at the expected utility of the gacha system by the three equations above.
Start with the lower bound of \(w_{min}\) in-game currency, where these players typically will have lower valuations and/or high real-money budgets. They have zero need to additional in-game currecny, thus the expected utility is strictly greater than zero, as there is no cost of earning in-game currency:
If the population was entirely these customers, then, in NE, the firm would set the probability of success equal to zero, making the consumers all have zero utility. This kind of NE is probably very familar to Game Theorists, not as it exists in other models, but that the outcome is kind of pointless (consumers are engaging in a thing for literally no gain, but it is the equilibrium).
Next, let us consider the opposite end of the in-game currecny range, where the consumers max it out. These consumers typically have higher valuations of the item, and/or have low real-money budgets, and therefore will need to earn the maximum in-game currency to expect to earn the item. These consumers have a expected utilty of:
If the expected utility is weakly positive, then the firm will earn the expected revenue given above, but if the expected utilty is negative, then the firm earns nothing. Therefore, we can evaluate the total revenue:
The consumer’s participation constraint can be interpreted as the expected valuation of the item (valuation multiplied by the probability of winning) must excede the cost of earning the additional in-game currency, but it can also be written as:
which would be interpreted as the probability of winning must exceed the ratio of the cost of earning the in-game currency to the valuation of the item. The reason to highlight this interpretation is that the probability of winning is part of the firm’s revenue function.
There is no analytical way to solve this (that I know of) as it all depends on the distributions of \(v_a\) and \(z_a\). What we can do is generate a sample of the distributions and find the expected revenue at different levels of \(\underline{p}\). Luckily, we have already created this sample in the appendix, so we can use that to find the expected revenue.
The graph above shows that the firm should set the probability of success as low as possible to maximise revenue. But won’t this cause a massive number of consumers to not participate? Well, it turns out, that the cost of earning the needed additional in-game currency is just not large enough to outweigh the expected utility of the item.
5.7 Adding Future items
The reason for including the next available item, is because Gacha games often announce all limited-time items available in a patch before the patch is released, and each limited-time item is available over each half of the patch, therefore the value of both items will inform the decision making process. For example, if you kinda like the current item, but really like the next item, with a fixed budget, you are far more likely to skip the current item (or at least spend less) in order to improve you chances to get the next item. This is what gives your remaining budget utility.
This means that we will not drop terms in our expected utility function:
Here, I have added a time index for periods 1 and 2, where the first period is the current item and the second period is the next item. I will assume that there will only be 2 items, so any remaining budget after the second item is obtained (if it is obtained) will yeild no utility, just as with the 1 period case above. This means that the expected utility function for the second period is what we have already found, but dropping the cost of earning in-game currency as this was already incurred in period 1.
[3] This could be expanded to any finite number of items, but usually firms will only release information (and indeed develop the items) for a few items at a time, meaning that expected utility for items beyond this is not known. We could alternatively set this as some constant, as consumers would assume that future items do will exist, and they will get some utility form them. I don’t think this would change the analysis much, but I can check later.
The expected utility function for the second period is:
Note that the budget is what remains after period 1, which will range form possibly nothing (which would be really bad luck), or possilby the entire budget if they did not try to get item 1. We can substitute this expected utility above to get the overal expected utility function:
This gives us a pretty simple utility function: it is the sum of the expected utility of each item, given the budgets, minus the cost of earning in-game currency. As we are using the geometric distribution for the probability of success, we can simplify the above function:
As the budgets must sum to the total budget \((b_{2a}=b_a-b_{1a})\) we can maximise the expected utility with some easy calculus. The first order condition is:
This shows that the ratio of the valuations of the items \(\frac{v_{1a}}{v_{2a}}\) will be equal to the ratio of the probability of never winning the item within the item’s allocated budget \((1-\underline{p})^{b_{2a}-b_{1a}}\). Basically, the higher the relative valuation, then the higher the relative budget. We can solve for the budgets:
We can interpret these as a divergence from a 50% split of the budget. If item 1 is prefered, then \(v_{1a} \gt v_{2a}\), which implies that \(b_{1a} > b_{2a}\), as \(ln(\gt 1) \gt 0\) and \(\ln(1-\underline{p})\lt0\), and vice versa. We can also show that the difference between the budgets is:
I’m not sure the second equation is any better than the first, but they’re basically the same. At this point, I’m starting to suspect that the NE will not be very different than with 1 item.
This is a similar participation contraint as the one item case, just weighted between the two items depending on the valuations.
We do need to update the revenue function to take into account the two items:
Here, we have the sum of the expected revenue from item 1, conditional on the budget assigned to item 1, and the expected revenue from item 2, conditional on the budget assigned to item 2.
Here I will talk about Gacha without Pity, and loot boxes (which are pretty similar). These tend to follow pretty simple geometric distributions, so are relatively easy to use.
6.2 Game Quality
The idea here is that the firm can invest more into development of the game in order to create a better game (not a strict rule that more development means a better game of course, but assume a general positive correlation). A better game will then influence the demand for the items in the game, by affecting the real-money budgets of players, and, in-turn, increasing the revenue. This would increase revenue in both a fixed price system and a gacha system, by increasing the price/expected price that players pay, but I think the increase would be higher in the fixed price system (need to check). In this scenario, we should treat the player’s real-money budets as an endogenous variable, dependant on the quality of the game. The idea is that the better the game is, the more enjoyment they get, and the more they would be willing to pay for items in the game. This happens quite frequently in free game where the only source of monetisation is transactions within the game (such as skins, gacha, battlepasses, etc.), where players will base their in-game purchases on how much they would have been willing to pay for the game if it wasn’t free. Some players even see this as a generosity to the game producers, as they liked a game they made, want them to make more so want to fund their future projects, and some player also feel like they owe the firm some money as they enjoyed the game and therefore, the firm deserves some compensation.
7 Conclusion
8 References
Gaillard, Alexandre, Christian Hellwig, Philipp Wangner, and Nicolas Werquin. 2023. “Consumption, Wealth, and Income Inequality: A Tale of Tails.” TSE Working Papers 23-1493. Toulouse School of Economics (TSE). https://ideas.repec.org/p/tse/wpaper/128768.html.
9 Appendix
9.1 Simulation
To illustrate the pity system in gacha, we can simulate the process of pulling 10,000 times, and record the number of times each rarity of item is obtained.The simulation can be played, paused, and restarted using the buttons below, as well as manually sliding the bar to view the results at a specific pull number.
I have also included some statistics from this simulation above. The first table shows the total number for each rarity as a moving total as the number of pulls increase. It also shows how the pity changes over time, to illustrate how the pity system works. The second table shows final totals, as well as the minimum, maximum, and average number of pulls required to obtain a rare and legendary item. In addition, the distribution of the pity for legendary items is graphed in Figure 13, against the predicted counts based on Figure 3.
The below table also shows legendary pity statistics from the simulation and compares them to the predictions.
Table 5: Descriptive Statistics for Legendary Pity from Simulation and Prediction
Mean Pity
Modal Pity
25% Percentile Pity
Median Pity
75% Percentile Pity
Simulation
54.37
78
27
72
79
Prediction
55.26
77
28
68
77
The simulated data and predictions are very well matched, with the exception of the 25% percentile, which is in the middle of the region which has the most variance due to such a low probability of success2. Looking at the simulated data, there is fewer successes between 0 and 33 than between 34 and the median of 67, which is why the 25% percentile is 10 pity higher in the simulation than the prediction.
To test if this 25% percentile difference is a one-off or a consistent bias, I will run the simulation 100 times and compare the 25% percentile of the pity for each run to the prediction. The distribution for the 25% percentile can be seen in Figure 14.
One can see that the range of values for the 25th percentile includes both the expected value (28), towards the middle, and the value from the simulation (38) towards the tail. This suggests that it was indeed an outlier, and that the simulation is consistent with the predictions. Further, the variance among the 25th percentile and median is larger than the 75th percentile as hypothesised. One can explain this by pointing to the variance of the geometrically distributed section of the probability distribution, of which the 25th percentile and median lie on. If this geometric distribution was continued to infinity (as it properly would be), the variance would be \(\frac{1-p}{p^2}\), which is larger for small probabilities. This gives reason for the larger variance in the 25th percentile and median, and the smaller variance in the 75th percentile, if not mathematically exact, or rigorous.
9.2 In-Game Currency Simulation
Below is the histograms for the in-game currency distribution, simulated for 10000 players. It shows both the histogram for no minimum or maximum values for the in-game currency, and a restricted version which is the one used in the model. Given that:
Below is a similar histogram for the in-game currency distribution but for the quadtratic cost function. The quadratic cost function is given by:
\[
cost = c\hat{w}_a^2 \text{ where } c=0.01
\]
ONe can see that the quadratic cost means that many fewer players will be willing to earn the maximum level of in-game currency, and therefore the density from the maximum level is shifted down into the lower levels. The unrestricted model also shows the upper levels of in-game currency being less desired and increased density in lower levels. Both seems to have some concavity (at least locally) in the distribution, instead of the uniformness seen with linear cost function.
In-Game Currency Distribution Simulation with Quadratic Cost Function
Below is the histogram for the full budget distribution, simulated for 10000 players. It shows the histogram for the full budget distribution, which is the sum of the in-game currency and the real-money budget, minus the costs of earning the in-game currency. The linear cost was used here due to simpliciity and the fact that the convex costs are not too dissimilar.
We use the restricted in-game currency, therefore the budget must be weakly larger than \(w_{min}=20\). The budget can be larger than \(w_{max}=70\), due to real-money budgets, and this implies that any excess over \(w_{max}\) is due to real money. We cannot, though, parse the exact breakdown between in-game currency and real money.
Footnotes
The quantiles were calculated using \(\frac{\ln(1-q)}{\ln(1-p)}\), where \(q\) is the quantile, and \(p\) is the probability of success. These were then rounded up to the nearest integer.↩︎
Not 100% about this, but my gut tells me that the low probability of these events causes very large variance, and can lead to the 25% percentile being more volatile than other percentiles. In fact, this carries through for any percentile below the median, which is itself very close to the soft pity, and the higher likelihood events, and therefore less volatile.↩︎
Source Code
---title: "Pricing in Gacha Games"author: "Alex Ralphs"toc: truenumber-sections: truebibliography: references.biblink-citations: truelink-bibliography: truecrossref: eq-prefix: ""highlight-style: pygmentsexecute: echo: falseformat: html: code-tools: true code-fold: true html-math-method: mathjax smooth-scroll: true page-layout: full embed-resources: true # pdf: # geometry: # - top=30mm # - left=20mm # docx: defaultjupyter: python3editor: render-on-save: trueeditor_options: chunk_output_type: inlinetheme: light: [flatly, lightmode.scss] dark: [darkly, darkmode.scss]---<head><!-- Remote stylesheet --> <link rel="stylesheet" href="styles.css"></head>## AbstractThis paper will explore the pricing model of 'gacha games', a genre of predominantly mobile games who's primary monetization method is the sale of 'gacha' draws. These draws allow players to trade for a random in-game item, with associated rarities corresponding to their probabilities. something, something... lotteries, something, something... technically not gambling, something, something... ethics?## IntroductionWhilst 'Gacha' is generally understood to be a form of gambling, this is not strictly true. Both gacha and gambling involve exchanging money for the probability of wining a 'prize', the key difference is in the payoff: gambling wins you money (more often than not a negative amount); but gacha wins you in-game items. Being lucky when gambling gives you the means to gamble more: you increase the money you have meaning you have more money to place on the next bet. Whereas being lucky in gacha gives you the means to gacha less. ## Literature ReviewI hope no one has done this before.## TheoryGacha is a series of lotteries: each pull has a set of possible outcomes, and each outcome has a probability of being drawn. Let us call each pull $L_k$, where $k\in \mathbb{N}_{++}$ is the number of the pull. Let the set of outcomes (i.e. the in-game items) be a fixed, finite set $X=[x_1, x_2, x_3, ..., x_n]$. Each outcome $x_i\in X$ has a probability of being drawn $p_i$. The discrete probability distribution over the outcomes is given by $P(x)$. To illustrate, consider the following example: ::: {style="margin-left: 2em;"}*Imagine that you want a new sword for your character, so let us assume the items are all swords. Next, assume swords come in 3 different rarities: common (call it a **Short Sword**), which is drawn at a probability of 90%; rare (call it an **Uchigatana**), which is drawn with a probability of 9%; and legendary (call it the **Sword of Night**), which is drawn with a probability of 1%.*:::Clearly, the likelihood of obtaining the rarest, and often best, items is very low, thus players generally need to pull many times before obtaining the *Sword of Night* which they desire. Furthermore, pulling more times only increases the probability of obtaining the *Sword of Night*, it does not guarantee it. This unfortunate fact lead to the next section, which addresses this issue.### PityIf you really wanted the *Sword of Night*, it would really suck if you pulled over and over, 100+ times, and still didn't get it. Whilst this is still a possibility in some gacha games, nearly all implement a 'Pity system', which effectively sets the maximum limit on the number of unsuccessful pulls. This 'hard pity' is determined by the probability of the rarest items, the cost of a pull, and the desired profit margin of the game developers. A typical number for 'hard pity' is around 100 pulls, so for our example let us choose 100 exactly. The so called 'hard pity' is not the only pity system in place, as the likelihood of the rarest items will increase the closer to the 'hard pity' you get. The number of pulls required to start the increase in probability is called the 'soft pity'. This is generally around 70-80 pulls, so let us choose 75. By law, game developers must disclose the probabilities of each item, but they are not required to disclose the 'soft pity' numbers, nor the probabilities of the items after this point. Thus, let us assume some linear increase in probability, from 1% at 74 pulls to 100% at 100 pulls. Such a probability function, conditional on the pity $k$, where the event $x$ is success, would be:$$P(x|k) = \begin{cases} 0.01 \quad \text{for } 0\leq k\lt74 \\ 0.01 + \frac{0.99}{26} (k+1-74) \quad \text{for } 74\leq k\leq 99 \quad \end{cases}$$ {#eq-cond-prob}Using this we can graph the probabilities (conditional on no successes) for the *Sword of Night* in @fig-cond-prob below. Note that the x-axis is the number of pity (given by $k$), which will always be 1 less than the number of pulls (given by $n$). ```{ojs}//| label: fig-cond-prob//| fig-cap: 'Conditional Probability of Success, P(x|k)'{ let Freq_data = [] for (var i=0; i<=99; i++){ if (i<74){ Freq_data.push({'Pity': i, 'Probability': 0.01}) } else { Freq_data.push({'Pity': i, 'Probability': 0.01 + 0.99*(i-73)/26}) } } const plot = Plot.plot({ x: { label: 'Pity, (k)', domain: [-2, 102], ticks: d3.ticks(0, 100, 20) }, y: { label: 'Probability, P(x|k)', }, marks: [ Plot.ruleX(Freq_data, {x: 'Pity', y: 'Probability', stroke: 'var(--plot-rule-color-1)', strokeWidth: 7}), Plot.tip(Freq_data, Plot.pointerX({x: 'Pity', y: 'Probability', fill: "var(--bs-body-bg)"})), Plot.frame(), ], width: width, }) return plot}```The second rarest items generally also have a pity system, where you are guaranteed to receive at least 1 of them every 10 pulls or so. The number of pulls to be guaranteed a rare item is openly stated, and is usually 10. Further, this guarantee is inclusive of rarer items as well, meaning that every 10 pulls you are guaranteed at least 1 rare item, but it could be a legendary item. The probability that this rare item will be of legendary rarity is usually unchanged (1%), implying that the probability of a common item is transferred to the rare item (90%+9%=99%). From the above distribution we can also calculate the expected number of pulls required to obtain the *Sword of Night*, first by finding the expected pity, and inferring that the expected number of pulls is 1 more than this. Call the number of pulls for a success $\bar{n}$. The expected pity, conditional on success, is given by the sum of the probability of $k$ being the first success, weighted by k:$$E(k+1|x) = 1 + \sum_{i=1}^{k} k P(k|x)$$ {#eq-exp-pity-plus-one}Note that the probability of success $x$ after $k+1$ pulls is dependant the probability of $k$ previous failures. Thus, the probability of success in pull $k+1$ is given by:$$P(x\cap k) = P(x|k)P(k) = \begin{cases} P(x|0) \qquad\qquad\qquad\qquad \text{ for } k=0 \\ P(x|k)\Pi_{i=0}^{k-1} (1-P(x|i)) \quad \text{for } k>0 \end{cases}$$ {#eq-joint-prob}In the case where the pity is zero, the probability of success is simply the conditional probability of success in the first pull, as the probability of zero pity is 1. All subsequent probabilities of success are conditional on the previous failures, and a thus defined slightly differently. In every case, as each pull is independent, we can deduce that $P(k)=P(k-1)(1-P(x|k-1))$ for $k>0$.As the probability of success is 1% below 75 pulls, and follows equation (number) from 75 pulls upwards, we can input these into the probability equation:$$P(x\cap k) = P(x|k)P(k) = \begin{cases} 0.01*(0.99)^k \qquad\qquad \text{for } 0\leq k\lt 74 \\ (0.01+\frac{0.99(k+1-74)}{26})*(0.99)^{73}*\Pi_{i=74}^{k}(1-0.01-\frac{0.99(i-74)}{26}) \quad \text{for } 74\leq k\leq 99 \end{cases}$$ {#eq-joint-prob-2}This can be shown graphically in @fig-joint-prob.```{ojs}//| label: fig-joint-prob//| fig-cap: 'Joint Probability of Success and Pity, P(x ∩ k)'{ let Prob_data = [] let prob_x_k = 0 for (var k=0; k<=99; k++){ if (k<74) { prob_x_k = 0.01*(0.99**k) Prob_data.push({'Pity': k, 'Probability': prob_x_k}) } else { let prob_failure_k_74 = 0.99**73 for (var i=74; i<=k; i++){ prob_failure_k_74 *= (0.99-0.99*(i-74)/26) } prob_x_k = (0.01+0.99*(k-73)/26)*prob_failure_k_74 Prob_data.push({'Pity': k, 'Probability': prob_x_k}) } } const plot_1 = Plot.plot({ x: { label: 'Pity, (k)', domain: [-2, 102], ticks: d3.ticks(0, 100, 20) }, y: { label: 'Probability, P(x ∩ k)', domain: [0, 0.08] }, marks: [ Plot.ruleX(Prob_data, {x: 'Pity', y: 'Probability', stroke: 'var(--plot-rule-color-1)', strokeWidth: 7}), Plot.tip(Prob_data, Plot.pointerX({x: 'Pity', y: 'Probability', fill: "var(--bs-body-bg)"})), Plot.frame(), ], width: width, }) return plot_1}{ let Prob_data_sum = [] let prob_x_k = 0 let prob_x_k_sum = 0 for (var k=0; k<=99; k++){ if (k<74) { prob_x_k = 0.01*(0.99**k) prob_x_k_sum += prob_x_k Prob_data_sum.push({'Pity': k, 'Probability': prob_x_k_sum}) } else { let prob_failure_k_74 = 0.99**73 for (var i=74; i<=k; i++){ prob_failure_k_74 *= (0.99-0.99*(i-74)/26) } prob_x_k = (0.01+0.99*(k-73)/26)*prob_failure_k_74 prob_x_k_sum += prob_x_k Prob_data_sum.push({'Pity': k, 'Probability': prob_x_k_sum}) } } const plot_2 = Plot.plot({ x: { label: 'Pity, (k)', domain: [-2, 102], ticks: d3.ticks(0, 100, 20) }, y: { label: 'Cumulative Probability, P(x ∩ k)', domain: [0, 1] }, marks: [ Plot.ruleX(Prob_data_sum, {x: 'Pity', y: 'Probability', stroke: 'var(--plot-rule-color-1)', strokeWidth: 7}), Plot.tip(Prob_data_sum, Plot.pointerX({x: 'Pity', y: 'Probability', fill: "var(--bs-body-bg)"})), Plot.frame(), Plot.ruleY([0.25], {stroke: 'var(--bs-body-color)', strokeDasharray: '5,5'}), Plot.ruleY([0.5], {stroke: 'var(--bs-body-color)', strokeDasharray: '5,5'}), Plot.ruleY([0.75], {stroke: 'var(--bs-body-color)', strokeDasharray: '5,5'}), ], width: width, }) return plot_2}```The joint probability of success and pity can be seen to fall slowly, at a factor of 0.99, until the soft pity of 74, where it rises sharply, due to an increased (conditional) probability of success, and then falls back down due to a decreasing probability of reaching such high levels of pity. This decreasing probability is precisely due to the fact that the (conditional) probability of success is rising, and thus the (conditional) probability of failure is decreasing.We can make some further inferences from this join probability distribution. The probability to obtain the *Sword of Night* in the first 74 pulls is `{python} round(100*sum([0.01*(0.99**k) for k in range(74)]), 2)`%. This can be calculated in two ways, either by subtracting the probability of no successes $(0.99^{74})$ from 100%, or one can sum the joint probabilities from the above figure from 0 to 73 pity:$$P(x|k<74) = 1-0.99^{74} = \sum_{i=0}^{73} 0.01(0.99)^i$$ {#eq-prob-74}To prove this is true, note that 0.01 = 1-0.99 thus:$$\sum_{i=0}^{73} 0.01(0.99)^i = \sum_{i=0}^{73} (1-0.99)(0.99)^i = \sum_{i=0}^{73} 0.99^i - \sum_{i=0}^{73} 0.99^{i+1}$$All but 1 term from each sum cancels each other out, only leaving $0.99^{0}-0.99^{74}$, and hence @eq-prob-74. From the joint probability it is a simple step to get to the probability of pity $k$, conditional on success (thanks Bayes):$$P(k|x) = \frac{P(x\cap k)}{P(x)}$$But what is the probability of success $P(x)$? This probability can be difficult to find in practice, but luckily it's rather simple: If you pull 100 times, you are guaranteed to succeed, thus $P(x)=1$. I made a slightly sneaky assumption in the above calculations, and that is that the probability of success 100%, and I did this by assuming that one would pull until they succeeded. This reason for this, is that the domain for the pity should be complete, which requires enough pulls to happen to allow for all values of pity to be possible: if you only do 50 pulls then it's impossible to have a pity above 50. More on this later. Assuming still that we will keep pulling until success (and no more), we can calculate the expected number of pulls required to obtain the *Sword of Night* as:$$E(\bar{n}|x) = 1 + \sum_{i=0}^{k} k P(k|x) = 1 + \sum_{i=0}^{k} k P(x\cap k)$$ {#eq-exp-pulls}```{python}import numpy as npexp_n_x =round(sum([k*0.01*(0.99**k) for k inrange(74)]) +sum([k*(0.01+0.99*(k-73)/26)*(0.99**73)*np.prod([1-0.01-0.99*(i-74)/26for i inrange(74, k+1)]) for k inrange(74, 100)]) +1, 2)```These densities can be seen in @fig-joint-density, the sum of which (plus 1) will yield our expected number of pulls for success. This value for our example is `{python} exp_n_x`. This is the number of pulls you should expect to make in order to get the *Sword of Night*, and therefore the basis for the price of the item. ```{ojs}//| label: fig-joint-density//| fig-cap: 'Density, P(x ∩ k)*k'{ let Density_data = [] let prob_x_k = 0 let den_k = 0 let den_k_sum = 0 for (var k=0; k<=99; k++){ if (k<74) { prob_x_k = 0.01*(0.99**k) den_k = prob_x_k*k den_k_sum += den_k Density_data.push({'Pity': k, 'Density': den_k}) } else { let prob_failure_k_74 = 0.99**73 for (var i=74; i<=k; i++){ prob_failure_k_74 *= (0.99-0.99*(i-74)/26) } prob_x_k = (0.01+0.99*(k-73)/26)*prob_failure_k_74 den_k = prob_x_k*k den_k_sum += den_k Density_data.push({'Pity': k, 'Density': den_k}) } } const plot = Plot.plot({ x: { label: 'Pity', domain: [-2, 102], ticks: d3.ticks(0, 100, 20) }, y: { label: 'Density, P(x ∩ k)*k', }, marks: [ Plot.ruleX(Density_data, {x: 'Pity', y: 'Density', stroke: 'var(--plot-rule-color-1)', strokeWidth: 7}), Plot.tip(Density_data, Plot.pointerX({x: 'Pity', y: 'Density', fill: "var(--bs-body-bg)"})), Plot.frame(), ], width: width, }) return plot}```The mean does not tell us the entire story, as the distribution is by no means symmetric: it is heavily weighted above a pity of 74. Using @fig-joint-prob, the median can be seen to be 68, with 25% and 75% percentiles of 28 and 78 respectively, and the mode is also 78. **A brief aside on the number of pulls**In the previous section, I assumed that the number of pulls is sufficiently large to guarantee success. This sets the probability of success to 1, and allows for the full range of pity to be possible. However, I want to consider the case where this is not true; where the number of pulls may not be sufficient to guarantee success. This can still be reflected, but requires truncation of the distributions depending on the number of pulls, and therefore a more complex probability of success. Instead of a unconditional probability of success, it will be conditional on the number of pulls, but just not conditional on the pity. This above probabilities will be the same, but also conditioned on the number of pulls, with the exception of when the pity is above the number of pulls, in which case the probability of success is 0. The conditional probability of success does not change, when also conditioning on n, the probability that is changing is the probability of reaching a pity of $k$. Therefore, $P(x|k, n)$ is the same as $P(x|k)$, and given by @eq-cond-prob. This implies that $x$ is independent of $n$.The joint probability of success and pity, conditional on the number of pulls, is given by @eq-joint-prob-n. This is the same as @eq-joint-prob, but with the added condition that $k<n$. As the probability of $k$ being equal to or larger than the number of pulls $n$ is 0, the joint probability of success and pity is 0 for $k\geq n$: $$P(x\cap k|n) = P(x|k, n)P(k|n) = $$$$P(x\cap k|n)=\begin{cases} \begin{cases} 0.01*(0.99)^k \qquad\qquad \text{for } 0\leq k\lt 74\\ (0.01+\frac{0.99(k+1-74)}{26})*(0.99)^{73}*\Pi_{i=74}^{k}(0.99-\frac{0.99(i-74)}{26}) \quad \text{for } 74\leq k\leq 99 \end{cases} \quad \text{for } k\lt n \\ 0 \qquad \text{for } k\geq n \end{cases}$${#eq-joint-prob-n}This can be seen in @fig-joint-prob-n, where the x-axis is still the pity, like in @fig-joint-prob, but one can use the slider to see the effect of changing the number of pulls.:::{style="display: flex"}::: {style="display: inline-block;"}```{ojs}//| label: fig-joint-prob-n//| fig-cap: 'Joint Probability of Success and Pity, P(x ∩ k|n)'{ let Prob_data_n = [] let prob_x_k_n = 0 for (var k=0; k<=99; k++){ if (k<74 && k<n) { prob_x_k_n = 0.01*(0.99**k) Prob_data_n.push({'Pity': k, 'Probability': prob_x_k_n}) } else if (k>=74 && k<n) { let prob_failure_k_n_74 = 0.99**73 for (var i=74; i<=k; i++){ prob_failure_k_n_74 *= (0.99-0.99*(i-74)/26) } prob_x_k_n = (0.01+0.99*(k-73)/26)*prob_failure_k_n_74 Prob_data_n.push({'Pity': k, 'Probability': prob_x_k_n}) } else { Prob_data_n.push({'Pity': k, 'Probability': 0}) } } const plot_1 = Plot.plot({ x: { label: 'Pity, (k)', domain: [-2, 102], ticks: d3.ticks(0, 100, 20) }, y: { label: 'Probability, P(x ∩ k)', domain: [0, 0.08] }, marks: [ Plot.ruleX(Prob_data_n, {x: 'Pity', y: 'Probability', stroke: 'var(--plot-rule-color-1)', strokeWidth: 7}), Plot.tip(Prob_data_n, Plot.pointerX({x: 'Pity', y: 'Probability', fill: "var(--bs-body-bg)"})), Plot.frame(), ], width: width, }) return plot_1}{ let Prob_data_n_sum = [] let prob_x_k_n = 0 let prob_x_k_n_sum = 0 for (var k=0; k<=99; k++){ if (k<74 && k<n) { prob_x_k_n = 0.01*(0.99**k) prob_x_k_n_sum += prob_x_k_n Prob_data_n_sum.push({'Pity': k, 'Probability': prob_x_k_n_sum}) } else if (k>=74 && k<n) { let prob_failure_k_n_74 = 0.99**73 for (var i=74; i<=k; i++){ prob_failure_k_n_74 *= (0.99-0.99*(i-74)/26) } prob_x_k_n = (0.01+0.99*(k-73)/26)*prob_failure_k_n_74 prob_x_k_n_sum += prob_x_k_n Prob_data_n_sum.push({'Pity': k, 'Probability': prob_x_k_n_sum}) } else { Prob_data_n_sum.push({'Pity': k, 'Probability': prob_x_k_n_sum}) } } let median = 0.5*Prob_data_n_sum[n-1]['Probability'] let lower_quartile = 0.25*Prob_data_n_sum[n-1]['Probability'] let upper_quartile = 0.75*Prob_data_n_sum[n-1]['Probability'] const plot_2 = Plot.plot({ x: { label: 'Pity, (k)', domain: [-2, 102], ticks: d3.ticks(0, 100, 20) }, y: { label: 'Cumulative Probability, P(x ∩ k)', domain: [0, 1] }, marks: [ Plot.ruleX(Prob_data_n_sum, {x: 'Pity', y: 'Probability', stroke: 'var(--plot-rule-color-1)', strokeWidth: 7}), Plot.tip(Prob_data_n_sum, Plot.pointerX({x: 'Pity', y: 'Probability', fill: "var(--bs-body-bg)"})), Plot.frame(), Plot.ruleY([lower_quartile], {stroke: 'var(--bs-body-color)', strokeDasharray: '5,5'}), Plot.ruleY([median], {stroke: 'var(--bs-body-color)', strokeDasharray: '5,5'}), Plot.ruleY([upper_quartile], {stroke: 'var(--bs-body-color)', strokeDasharray: '5,5'}), ], width: width, }) return plot_2}```:::::: {style="display: inline-block; margin-left: -100%; position: relative; left: 60%; top: -15px"}```{ojs}viewof n = Inputs.range( [1, 100], { step: 1, value: 100, label: 'Number of pulls', id: 'n'})```::::::One may notice that if $k\ge n$, then the cumulative joint probability, conditional on $n$, no longer adds to 1. This number is the probability of success, conditional on n, and is given by the sum over the joint probabilities, conditional on $n$, for $k<n$:$$P(x|n) = \sum_{i=0}^{n-1} P(x\cap i|n)$$Note that this is increasing with $n$, reflecting that the probability of success increases the more pulls that you do, until it reaches 1 at hard pity. From this we can find the expected number of pulls to get a success $\bar{n}$, conditional on only $n$ pulls: $$E(\bar{n}|x, n) = 1 + \sum_{i=0}^{k} k P(k|x, n) = 1 + \sum_{i=0}^{k} k \frac{P(x\cap k|n)}{P(x|n)} = 1 + \frac{1}{\sum_{j=0}^{n-1} P(x\cap j|n)}\sum_{i=0}^{k} k P(x\cap k|n)$$These joint densities can be seen in @fig-joint-density-n, where the x-axis is still the pity, like in @fig-joint-density, but one can use same the slider above to see the effect of changing the number of pulls. ```{ojs}//| label: fig-joint-density-n//| fig-cap: 'Density, P(x ∩ k|n)*k'{ let Density_data_n = [] let prob_x_k_n = 0 let den_k_n = 0 let den_k_sum_n = 0 let prob_x_j_n = 0 let prob_x_j_n_sum = 0 for (var j=0; j<=89; j++){ if (j<74 && j<n) { prob_x_j_n = 0.01*(0.99**j) prob_x_j_n_sum += prob_x_j_n } else if (j>=74 && j<n) { let prob_failure_j_n_74 = 0.99**73 for (var l=74; l<=j; l++){ prob_failure_j_n_74 *= (0.99-0.99*(l-74)/26) } prob_x_j_n = (0.01+0.99*(j-73)/26)*prob_failure_j_n_74 prob_x_j_n_sum += prob_x_j_n } } for (var k=0; k<=99; k++){ if (k<74 && k<n) { prob_x_k_n = 0.01*(0.99**k) den_k_n = prob_x_k_n*k den_k_sum_n += den_k_n Density_data_n.push({'Pity': k, 'Density': den_k_n/prob_x_j_n_sum}) } else if (k>=74 && k<n) { let prob_failure_k_n_74 = 0.99**73 for (var i=74; i<=k; i++){ prob_failure_k_n_74 *= (0.99-0.99*(i-74)/26) } prob_x_k_n = (0.01+0.99*(k-73)/26)*prob_failure_k_n_74 den_k_n = prob_x_k_n*k den_k_sum_n += den_k_n Density_data_n.push({'Pity': k, 'Density': den_k_n/prob_x_j_n_sum}) } else { Density_data_n.push({'Pity': k, 'Density': 0}) } } const plot = Plot.plot({ x: { label: 'Pity', domain: [-2, 102], ticks: d3.ticks(0, 100, 20) }, y: { label: 'Density, P(x ∩ k|n)*k', domain: [0, 8.5] }, marks: [ Plot.ruleX(Density_data_n, {x: 'Pity', y: 'Density', stroke: 'var(--plot-rule-color-1)', strokeWidth: 7}), Plot.tip(Density_data_n, Plot.pointerX({x: 'Pity', y: 'Density', fill: "var(--bs-body-bg)"})), Plot.frame(), ], width: width, }) return plot}```The distribution in unchanged from @fig-joint-density when $n=100$, but as $n$ falls the rest of the distribution grows. This happens because as the probability of success decreases with fewer pulls, the densities are scaled up. To show the effect of the number of pulls on the expected number of pulls to get a single success, see the figure below.```{ojs}//| label: fig-exp-pulls-n//| fig-cap: 'Expected Number of Pulls to get a Success, E(n|x, n)'{ let Exp_data_n = [] let prob_success_data = [] for (var N=1; N<=100; N++) { let prob_x_k_n = 0 let den_k_n = 0 let prob_x_j_n = 0 let prob_x_j_n_sum = 0 for (var j=0; j<=89; j++){ if (j<74 && j<N) { prob_x_j_n = 0.01*(0.99**j) prob_x_j_n_sum += prob_x_j_n } else if (j>=74 && j<N) { let prob_failure_j_n_74 = 0.99**73 for (var l=74; l<=j; l++){ prob_failure_j_n_74 *= (0.99-0.99*(l-74)/26) } prob_x_j_n = (0.01+0.99*(j-73)/26)*prob_failure_j_n_74 prob_x_j_n_sum += prob_x_j_n } } let exp_den = 1 for (var k=0; k<=99; k++){ if (k<74 && k<N) { prob_x_k_n = 0.01*(0.99**k) den_k_n = prob_x_k_n*k exp_den += den_k_n/prob_x_j_n_sum } else if (k>=74 && k<N) { let prob_failure_k_n_74 = 0.99**73 for (var i=74; i<=k; i++){ prob_failure_k_n_74 *= (0.99-0.99*(i-74)/26) } prob_x_k_n = (0.01+0.99*(k-73)/26)*prob_failure_k_n_74 den_k_n = prob_x_k_n*k exp_den += den_k_n/prob_x_j_n_sum } else { exp_den += 0 } } Exp_data_n.push({'N': N, 'exp_n_x': exp_den}) prob_success_data.push({'N': N, 'prob_x_j_n_sum': prob_x_j_n_sum}) } const max_exp_den = Math.max(...Exp_data_n.map(d => d.exp_n_x)) const min_exp_den = Math.min(...Exp_data_n.map(d => d.exp_n_x)) const prob_success_data_scaled = d3.scaleLinear(d3.extent(prob_success_data, d => d.prob_x_j_n_sum), [min_exp_den, max_exp_den]) // for (var i=0; i<prob_success_data.length; i++){ // prob_success_data[i]['prob_x_j_n_sum'] = prob_success_data[i]['prob_x_j_n_sum']*max_exp_den // } const plot = Plot.plot({ x: { label: 'No. Pulls', domain: [0, 100], ticks: d3.ticks(0, 100, 20) }, y: { axis: "left", label: 'Exp No. Pulls to get a Success', }, marks: [ Plot.ruleX(Exp_data_n, {x: 'N', y: 'exp_n_x', stroke: 'var(--plot-rule-color-1)', strokeWidth: 7}), Plot.line(prob_success_data, Plot.mapY((D) => D.map(prob_success_data_scaled), {x: 'N', y: d => d.prob_x_j_n_sum, stroke: 'var(--plot-rule-color-2)', strokeWidth: 2})), Plot.line([[0, 0], [55, 55]], {stroke: 'var(--bs-body-color)', strokeDasharray: '5,5'}), Plot.tip(Exp_data_n, Plot.pointerX({x: 'N', y: 'exp_n_x', fill: "var(--bs-body-bg)"})), Plot.axisY(d3.ticks(0, 1, 10), {color: 'var(--plot-rule-color-2)', anchor: "right", label: 'Probability of Success', y: prob_success_data_scaled, tickFormat: prob_success_data_scaled.tickFormat()}), Plot.frame(), ], width: width, }) return plot}``````{ojs}{ let prob_success_data = [] for (var N=0; N<=100; N++) { let prob_x_k_n = 0 let den_k_n = 0 let prob_x_j_n = 0 let prob_x_j_n_sum = 0 for (var j=0; j<=89; j++){ if (j<74 && j<N) { prob_x_j_n = 0.01*(0.99**j) prob_x_j_n_sum += prob_x_j_n } else if (j>=74 && j<N) { let prob_failure_j_n_74 = 0.99**73 for (var l=74; l<=j; l++){ prob_failure_j_n_74 *= (0.99-0.99*(l-74)/26) } prob_x_j_n = (0.01+0.99*(j-73)/26)*prob_failure_j_n_74 prob_x_j_n_sum += prob_x_j_n } } let exp_den = 1 for (var k=0; k<=99; k++){ if (k<74 && k<N) { prob_x_k_n = 0.01*(0.99**k) den_k_n = prob_x_k_n*k exp_den += den_k_n/prob_x_j_n_sum } else if (k>=74 && k<N) { let prob_failure_k_n_74 = 0.99**73 for (var i=74; i<=k; i++){ prob_failure_k_n_74 *= (0.99-0.99*(i-74)/26) } prob_x_k_n = (0.01+0.99*(k-73)/26)*prob_failure_k_n_74 den_k_n = prob_x_k_n*k exp_den += den_k_n/prob_x_j_n_sum } else { exp_den += 0 } } prob_success_data.push({'N': N, 'prob_x_j_n_sum': prob_x_j_n_sum}) } const plot = Plot.plot({ x: { label: 'No. Pulls', domain: [0, 100], ticks: d3.ticks(0, 100, 20) }, y: { axis: "left", label: 'Prob of Success', domain: [0, 1] }, marks: [ Plot.line(prob_success_data, {x: 'N', y: 'prob_x_j_n_sum', stroke: 'var(--plot-rule-color-2)', strokeWidth: 2}), Plot.tip(prob_success_data, Plot.pointerX({x: 'N', y: 'prob_x_j_n_sum', fill: "var(--bs-body-bg)"})), Plot.frame(), ], width: width, }) return plot}```As we can see, much of the expected number of pulls for a success is driven by the probability of success (conditional on the number of pulls), which is increasing with the number of pulls. Each interval, one before and one after soft pity, is also concave, whilst also always being less than a $45^{\circ}$ line. ### SparksNot all gacha systems include a pity system, and instead rely on a 'spark' system. This system is similar to the pity system, in the fact that it restricts the maximum number of pulls you need to do before acquiring the desired item. The difference is that the 'spark' system does not manipulate probabilities, but rather guarantees the choice of the item after a certain number of pulls. A common number for this is 200 pulls, and if the (pre-chosen) desired item has not been obtained by this point, one can choose to exchange 'sparks' for it. A single spark is obtained for every failed pull. This system is more simple to calculated expected values for the number of pulls, as the probability of success is a simple geometric distribution. The system under pity is also geometrically distributed, but the probability of success is not constant, and is dependent on the number of pulls, whereas the probability of success in the spark system is constant.$$P(n=\bar{n}) = (1-p)^{\bar{n}-1} p$$The expected number of pulls to obtain the desired item is given by @eq-exp-pulls-spark:$$E(\bar{n}) = \frac{1}{p} = \frac{1}{0.01} = 100$$ {#eq-exp-pulls-spark}This expected value is based on the assumption that one will do 200 pulls (if necessary), so success is guaranteed. Unsurprisingly, with the same probability of success, the expected number of pulls is larger than the pity system. I have also simulated the spark system for 100000 pulls (p=0.01 and number of sparks for legendary is 200), and the distribution of the number of pulls can be seen in @fig-simulation-spark. The statistics from these simulations, and the predicted amounts, can be seen in the table below.```{python}import pandas as pdimport jsonlegendary_obtained_pity_list_spark = pd.read_csv('simple_obtained_pity_list_spark.csv')['pity'].to_list()legendary_obtained_pity_stats_spark = json.loads(open('simple_sim_spark.json').read())legendary_count_spark = legendary_obtained_pity_stats_spark['legendary_count']min_legendary_pity_spark = legendary_obtained_pity_stats_spark['min_pity']max_legendary_pity_spark = legendary_obtained_pity_stats_spark['max_pity']ojs_define(legendary_obtained_pity_list_spark = legendary_obtained_pity_list_spark)ojs_define(legendary_count_spark = legendary_count_spark)ojs_define(legendary_min_pity_spark = min_legendary_pity_spark)ojs_define(legendary_max_pity_spark = max_legendary_pity_spark)``````{ojs}//| label: fig-simulation-spark//| fig-cap: 'Pity Distribution for Legendary Items under Spark System'{ let Pity_data_spark = [] let Pity_predicted_spark = [] let prob_x_k_spark = 0 let pity_list_spark = legendary_obtained_pity_list_spark for (var i=0; i<=legendary_max_pity_spark; i++){ let count = pity_list_spark.filter(x => x == i).length Pity_data_spark.push({'Pity': i, 'Count': count}) } for (var k=0; k<=legendary_max_pity_spark; k++){ prob_x_k_spark = 0.01*(0.99**k) Pity_predicted_spark.push({'Pity': k, 'Count': prob_x_k_spark*legendary_count_spark}) } const plot = Plot.plot({ x: { label: 'Pity', domain: [0, 800] }, y: { label: 'Count', }, color: { legend: true, domain: ['Simulation Data', 'Prediction'], range: ['var(--plot-rule-color-1)', 'var(--plot-rule-color-2)'], }, marks: [ Plot.dot(Pity_data_spark, {x: 'Pity', y: 'Count', r: 0}), Plot.ruleX(Pity_data_spark, {y: 'Count', x: 'Pity', stroke: 'var(--plot-rule-color-1)', strokeWidth: 1}), Plot.tip(Pity_data_spark, Plot.pointerX({x: 'Pity', y: 'Count', fill: "var(--bs-body-bg)"})), Plot.line(Pity_predicted_spark, {x: 'Pity', y: 'Count', stroke: 'var(--plot-rule-color-2)', strokeWidth: 2}), Plot.frame(), ], width: width, }) return plot}```::: {#tbl-spark-stats}<table style="width: 100%"> <tr style="border-bottom: 1px solid black"> <th></th> <th>Mean Pity</th> <th>Modal Pity</th> <th>25% Percentile Pity</th> <th>Median Pity</th> <th>75% Percentile Pity</th> </tr> <tr> <td>Simulation</td> <td>`{python} round(legendary_obtained_pity_stats_spark['mean_pity'], 2)`</td> <td>`{python} legendary_obtained_pity_stats_spark['modal_pity']`</td> <td>`{python} legendary_obtained_pity_stats_spark['percentile_25']`</td> <td>`{python} legendary_obtained_pity_stats_spark['median_pity']`</td> <td>`{python} legendary_obtained_pity_stats_spark['percentile_75']`</td> </tr> <tr> <td>Prediction [^2]</td> <td>100</td> <td>1</td> <td>`{python} np.ceil(np.log(0.75)/np.log(0.99))`</td> <td>`{python} np.ceil(np.log(0.50)/np.log(0.99))`</td> <td>`{python} np.ceil(np.log(0.25)/np.log(0.99))`</td> </tr></table>Descriptive Statistics for the Spark System:::[^2]: The quantiles were calculated using $\frac{\ln(1-q)}{\ln(1-p)}$, where $q$ is the quantile, and $p$ is the probability of success. These were then rounded up to the nearest integer.Note, there are many observations above the spark threshold of 200, which means that it took more than 200 pulls to get a legendary item *naturally*, but one can assume that the player would choose the legendary item after 200 pulls, and would only continue to pull if they want further copies, or if a new limited-time item is released (in which case the spark system would reset, but their previous pull count would not). Therefore, we can view these observations as cases where the player would select a number of legendary items through the spark system and then obtains a legendary item through chance after the observed pulls. ### 50/50Apologies, I should have included a trigger warning here for the gacha gamers reading this. The dreaded '50/50' is something that has turned great excitement into extreme disappointment and anger. The reason for why is that, in most gacha systems, when you get a legendary item, you're not always guaranteed that it's the legendary item you want. In fact it can be legendary item you really don't want. Until now, i've assumed a single item of the highest rarity (in fact for any rarity). However, usually there is a set of legendary items, of which 1 will be chosen. This set will include a single limited-time item: normally more powerful, better looking, more fun; and also a subset of 'standard' items. When you obtain a legendary item, there is a 50% probability that it is the limited-time item, but also a 50% probability that it is any one of the 'standard' items. This is known as the 50/50. Typically, losing the 50% chance of the legendary item of choice, will guarantee it for the next legendary item obtained, limiting the number of times one can lose this 50/50. Some systems will have variations where there is more than one limited-time item and the 50/50 is split across them, with a separate probability determining whether one obtains a limited legendary item or a 'standard' legendary item. For example, assume that there are 2 limited-time legendary items; on pulling a legendary, there is a random draw to determine whether ths legendary will be one of the 2 limited-time items or a standard item, and say these probabilities are 75% to 25%; then, if a limited-time item is chosen, there is a 50% chance that it will be either of the 2 limited-time items; if a standard item is chosen, the probabilities are also split uniformly across however many standard items are in the set. Knowing that your chances of getting the limited-time item is not certain, even when you get a legendary item, will change the expected number of pulls, but it's not obviously clear by how much it would increase this number. Instinct may tell us that the expected number of pulls will be half way between winning on the first legendary item, and losing the 50/50 and being guaranteed it on the next legendary. The expected value will, by definition, be between these two values, but the expectation is not the sum of the expectations of 2 legendary pull sequences. This is because the latter sequence is conditional on the first sequence, as 50% of the time (on average) you win on the first legendary, thus erasing the need to keep pulling. Let us continue with the example used previously, but add a 50/50 mechanic. The range of pulls needed to the a single limited-time legendary, is now between 1 pull (if you're lucky) and 200 pulls (if you're really unlucky). Again, assume for simplicity that one will do enough pulls to guarantee the desired item, such that $p(x)=1$. We can then calculate the expected number of pulls needed to get the limited-time legendary using the same conditional probability distribution in @fig-cond-prob, but changing the probability of observing each pity number to reflect the 50/50. The key difference here is that some of the possible values for the number of pulls for a success have (broadly) 2 different paths to them. For example, a successful pull of the limited-time legendary after 10 pulls can be achieved by either winning the 50/50 on the 10th pull, or losing the 50/50 before this, but getting another legendary on the 10th pull. These are the 2 broadly different paths, but actually there is 10 different possibilities here: winning the 50/50 on the 10th pull is one, but there are 9 different ways to lose the 50/50 before the 10th pull, and then get the limited-time legendary on the 10th pull. Therefore, the probability of getting the limited-time legendary on the 10th pull is the sum of the probabilities of these 10 different paths. For this example, the probability is:$$p(\bar{n}=10) = \underbrace{(0.99^9)}_{\text{9 losses}}\underbrace{(0.01)}_{\text{1 win}}\underbrace{(0.5)}_{\text{win 50/50}} + 9\underbrace{(0.99^8)}_{\text{8 losses}}\underbrace{(0.01^2)}_{\text{2 wins}}\underbrace{(0.5)}_{\text{lose 50/50}} = (0.99^8)(0.01)(0.5)(0.99+0.01) \approx 0.00046$$Note that I have used the probability of success of 0.01 as we are well before the soft pity. If we use the general terms for the probability conditional on the pity $p(x|k)$, we would have:$$p(\bar{n}=10) = \underbrace{[\Pi_{i=0}^{8}(1-p(x|k=i))]}_{\text{1st 9 are losses}} \underbrace{p(x|k=9)}_{\text{1 win in 10}} \underbrace{(0.5)}_{\text{win 50/50}} + \text{Pr(8 losses and 2 wins)} \underbrace{(0.5)}_{\text{lose 50/50}}$$This term I have not calculated is more complex as it needs to contain the multiple different ways of achieving $n-2$ losses and 2 wins, specifically with the second win at $n$. To calculate this, see the below table of probabilities. Each row represents when the first win occurs, and then the probability of this sequence happening. <table style="width: 100%"> <tr> <th style="width: 75px">First Win</th> <th colspan="7" style="text-align: center">Probability</th> </tr> <tr> <td>$m$</td> <td>$Pr(\text{m-1 losses})$</td> <td></td> <td>$Pr(\text{Win in pull m})$</td> <td></td> <td>$Pr(\text{Loss until pull 9})$</td> <td></td> <td>$Pr(\text{Win in pull 10})$</td> </tr> <tr> <td>1</td> <td>1</td> <td>$\times$</td> <td>$p(x|k=0)$</td> <td>$\times$</td> <td>$[\Pi_{i=0}^{7}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=8)$</td> </tr> <tr> <td>2</td> <td>$(1-p(x|k=0))$</td> <td>$\times$</td> <td>$p(x|k=1)$</td> <td>$\times$</td> <td>$[\Pi_{i=0}^{6}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=7)$</td> </tr> <tr> <td>3</td> <td>$[\Pi_{i=0}^{1}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=2)$</td> <td>$\times$</td> <td>$[\Pi_{i=0}^{5}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=6)$</td> </tr> <tr> <td>4</td> <td>$[\Pi_{i=0}^{2}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=3)$</td> <td>$\times$</td> <td>$[\Pi_{i=0}^{4}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=5)$</td> </tr> <tr> <td>5</td> <td>$[\Pi_{i=0}^{3}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=4)$</td> <td>$\times$</td> <td>$[\Pi_{i=0}^{3}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=4)$</td> </tr> <tr> <td>6</td> <td>$[\Pi_{i=0}^{4}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=5)$</td> <td>$\times$</td> <td>$[\Pi_{i=0}^{2}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=3)$</td> </tr> <tr> <td>7</td> <td>$[\Pi_{i=0}^{5}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=6)$</td> <td>$\times$</td> <td>$[\Pi_{i=0}^{1}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=2)$</td> </tr> <tr> <td>8</td> <td>$[\Pi_{i=0}^{6}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=7)$</td> <td>$\times$</td> <td>$(1-p(x|k=0))$</td> <td>$\times$</td> <td>$p(x|k=1)$</td> </tr> <tr> <td>9</td> <td>$[\Pi_{i=0}^{7}(1-p(x|k=i))]$</td> <td>$\times$</td> <td>$p(x|k=8)$</td> <td>$\times$</td> <td>1</td> <td>$\times$</td> <td>$p(x|k=0)$</td></table>This table does show a pattern, but as the 1st and last terms contain a degenerate probability of 1, the generalisation needs to not include these in the larger summation. Therefore, the general formula contains 2 main terms: the combination the first ans last terms, which are the same due to the symmetry here; and the total summation of all the terms in between. Note, that this only works for $n\le4$, such that the second summation term contains any terms. I have also changed the notation of the conditional probabilities such that the pity value it is conditioned on is in the subscript, for example $p(x|k=0)$ is now $p_0$.$$p(n-2\text{ losses and } 2 \text{ wins}) = 2p_0[\Pi_{i=0}^{n-3}(1-p_i)]p_{n-2}$$$$+\sum_{m=2}^{n-2} [\Pi_{i=0}^{m-2}(1-p_i)]p_{m-1}[\Pi_{i=0}^{n-m-2}(1-p_i)]p_{n-m-1}$$Combining with the previous equations, gives us the complete formulas: $$p(\bar{n}=n) = \frac{1}{2}[\Pi_{i=0}^{n-2}(1-p_i)]p_{n-1}+ p_0[\Pi_{i=0}^{n-3}(1-p_i)]p_{n-2}+ \frac{1}{2}\sum_{m=2}^{n-2} [\Pi_{i=0}^{m-2}(1-p_i)]p_{m-1}[\Pi_{i=0}^{n-m-2}(1-p_i)]p_{n-m-1}$$ {#eq-50-50}Somehow my simplifications make this look more complex. Surely it simplifies... right? One way we could simplify the calculation of this is to separate the equation before n=75 (i.e. soft-pity plus 1) and after. Before soft-pity, all marginal probabilities are the same $(p(x|k)=p=0.01)$, so the equation simplifies to:$$p(\bar{n}=n\le75) = \frac{1}{2}(1-p)^{n-1}p+ (1-p)^{n-2}p^2 + (\frac{n-3}{2})(1-p)^{n-2}p^2$$$$p(\bar{n}=n\le75) = \frac{1}{2}(1-p)^{n-2}p(1+(n-2)p)$$ {#eq-50-50-<=75}This equation is not valid for values of $n$ below 4, so we can only use it above this. I will therefore state the formulas for n=1, 2, and 3 here:The next range to look at is if $76\le n\le148$, as this is the range where 2 successful pulls could include one from after soft pity but could also avoid soft pity. For example, one could lose the 50-50 on pull 74, but win on the very next pull (75), or one could have to wait another 74 pulls to win (148). The key difference between this range and the previous range below 74, is that it is possible to pass soft pity. This will complicate the probabilities, but only in that the probabilities are not the same for each pull, therefore the logical process is the same. Suppose one gets success after 100 pulls, similar to the table above, knowing that there must be 1 50-50 loss, there are 99 different paths to this point, each defined by where this 50-50 loss occurs. If this 50-50 loss occurred anywhere between (and including) pull 26 and 74, then we need not worry about soft pity and all of these cases will not require it. Outside these bounds then one of the legendary wins occurred after soft pity, and therefore used a higher conditional probability of success. For any $n\in[76, 148]$, we need not use the post-soft pity probabilities if the pull of first win $m\in[n-74,74]$. We can use @eq-50-50 generally for all value of $n\le4$, but it may be easier to use a constant $p$ for any pity value less than 74, and separate the probabilities likewise. The last range to assess is that for $n\ge149$. This is the range where it is guaranteed that at least one of the legendary wins will be after soft pity. At $n=149$, this will be one win after 74 and the other after 75, whereas for every value of n above this both pulls will be after soft pity. :::{style="display: flex"}::: {style="display: inline-block;"}```{ojs}//| label: fig-50-50-dist//| fig-cap: 'Distribution of Successes under a 50-50 System'{ if (fig_50_50_stacked == true) { var loss_y1 = 'Probability if 50/50 Win' var loss_y2 = 'Total Probability' } else { var loss_y1 = 0 var loss_y2 = 'Probability if 50/50 Loss' } const plot = Plot.plot({ x: { label: 'Pull Number', domain: [0, 200] }, y: { label: 'Pr', domain: [0, 0.04] }, color: { legend: true, domain: ['Winning 50/50', 'Losing 50/50'], range: ['var(--plot-rule-color-1)', 'var(--plot-rule-color-2)'], }, marks: [ Plot.ruleX(data_with_50_50, {y: 'Probability if 50/50 Win', x: 'Pull Number', stroke: 'var(--plot-rule-color-1)', strokeWidth: 4}), Plot.ruleX(data_with_50_50, {y1: loss_y1, y2: loss_y2, x: 'Pull Number', stroke: 'var(--plot-rule-color-2)', strokeWidth: 4}), Plot.tip(data_with_50_50, Plot.pointerX({ x: 'Pull Number', y: 'Total Probability', channels: { loss: { value: 'Probability if 50/50 Loss', label: 'Pr if Lose 50/50' }, win: { value: 'Probability if 50/50 Win', label: 'Pr if Win 50/50' } }, fill: "var(--bs-body-bg)"})), // Plot.tip(data_with_50_50, Plot.pointerX({x: 'Pull Number', y: 'Probability if 50/50 Loss', stroke: 'orange', fill: "var(--bs-body-bg)"})), Plot.frame(), ], width: width, }) return plot}```:::::: {style="display: inline-block; margin-left: -100%; position: relative; left: 80%"}```{ojs}viewof fig_50_50_stacked = Inputs.toggle({ label: 'Stacked', value: false})```::::::The blue series shows outcomes where the 50/50 is won, therefore reflects @fig-cond-prob. The orange series shows outcomes where the 50/50 is lost previously and the guaranteed win occurs on this pull. This series is influenced by the conditional probability distribution, in that the probability has peaks after the soft pity and after 2 times the soft pity, as these are the most likely places for wins to occur. A clear difference between the series' is in the first pulls until soft pity, where the blue series is falling, as higher number of pulls needed implies more failures, whereas the orange series is rising, as this requires two wins and the number of possible ways of doing this increases in the number of pulls needed. This is also why the orange series starts falling from the peak after the first soft pity, until the second soft pity. This is because of the hard pity, which starts to reduce the number of combinations of pulls needed to get 2 wins. For example, compare obtaining a guaranteed legendary (after losing the 50/50) after 100 pulls to after 140 pulls: there are 99 different ways to win twice in 100 pulls, where the second win is at 100; but there are only 61 different ways to win twice in 140 pulls, where the second win is at 130, as the first win must be after 40 pulls but before 100 pulls (including 40 & 90). Descriptive statistics of the above distribution can be found below in @tbl-50-50-stats. The table includes these statistics for winning the 50/50, losing the 50/50, and the total probability of success. THe total probability is the probability actually faced, and what determines the price. ```{python}import jsonlegendary_obtained_50_50_1 = json.loads(open('50-50-graph-data-1.json').read())legendary_obtained_50_50_2 = json.loads(open('50-50-graph-data-2.json').read())legendary_obtained_50_50_3 = json.loads(open('50-50-graph-data-3.json').read())legendary_obtained_50_50_1_2 = json.loads(open('50-50-graph-data-1-2.json').read())legendary_obtained_50_50_2_2 = json.loads(open('50-50-graph-data-2-2.json').read())legendary_obtained_50_50_3_2 = json.loads(open('50-50-graph-data-3-2.json').read())```::: {#tbl-50-50-stats}<table style="width: 100%"> <tr style="border-bottom: 1px solid black"> <th></th> <th>Mean Pull No.</th> <th>Modal Pull No.</th> <th>25% Percentile Pull No.</th> <th>Median Pull No.</th> <th>75% Percentile Pull No.</th> </tr> <tr> <td>50/50 Win</td> <td>`{python} round(legendary_obtained_50_50_1['win_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_1['win_mode']`</td> <td>`{python} legendary_obtained_50_50_1['win_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_1['win_median']`</td> <td>`{python} legendary_obtained_50_50_1['win_75_percentile']`</td> </tr> <tr> <td>50/50 Loss</td> <td>`{python} round(legendary_obtained_50_50_1['lose_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_1['lose_mode']`</td> <td>`{python} legendary_obtained_50_50_1['lose_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_1['lose_median']`</td> <td>`{python} legendary_obtained_50_50_1['lose_75_percentile']`</td> </tr> <tr style="border-top: 1px solid black"> <td>Combined</td> <td>`{python} round(legendary_obtained_50_50_1['combined_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_1['combined_mode']`</td> <td>`{python} legendary_obtained_50_50_1['combined_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_1['combined_median']`</td> <td>`{python} legendary_obtained_50_50_1['combined_75_percentile']`</td> </tr></table>Descriptive Statistics for the 50/50 System:::One can see that the main effect of the 50/50 system is to increase the expected number of pulls needed to obtain the desired item. Every single statistic in @tbl-50-50-stats is higher with the 50/50 (the mode is the same), increasing the mean by `{python} 100*legendary_obtained_50_50_1['combined_mean']/legendary_obtained_50_50_1['win_mean']-100`%. I have not rounded this number up to 50% yet, as it is suspiciously close to 50% that this *may* not be coincidence. In the next paragraph I will check this (will likely put this in an appendix). On the face of it, this result does make sense: half the time, it will take twice as long to obtained the desired item. Therefore, on average, the number of pulls needed should increase by 50%. Comparing the mean pull number for 50/50 win and loss, it takes twice as many pulls on average to get the desired item if you lose the 50/50 compared to winning it. If you compare the un-rounded numbers, the exact factor is slightly less than 2, the question is whether this is due to rounding error in previous steps, or if this is supported mathematically. Firstly, if there were no pity system (i.e. a constant probability of success), the probability for a single success follows a geometric distribution, and the expected number of pulls needed to get 1 success is $\frac{1}{p}$. If we consider the 50/50 system, if you lose the 50/50 then the probability distribution were are concerned with is a negative binomial distribution, as we need to know how many trials are needed for 2 successes. The mean of this distribution is $\frac{2}{p}$, exactly twice as many pulls as needed for a single success. Therefore, the question to be answered here is whether the pity system changes this result, and if so, by how much. Before diving into the slightly daunting maths, first let's try to replicate the result with different probabilities. For example, reduce the conditional probability of success, before soft pity, (call it $\underline{p}$) from 1% to 0.5%, or increase it to 2%. The table below shows the results for these 2 cases. the rows labelled '50/50 % inc' show the percentage increase in the mean number of pulls needed to get the desired item, compared to the mean number of pulls needed to get the desired item without the 50/50 system. ::: {#tbl-50-50-stats-2}<table style="width: 100%"> <tr> <th colspan="3" style="border-right: 1px solid black">Parameters</th> <th colspan="5">Pull Number</th> </tr> <tr style="border-bottom: 1px solid black"> <th>Soft Pity</th> <th>$\underline{p}$</th> <th style="border-right: 1px solid black"></th> <th>Mean</th> <th>Modal</th> <th>25% Percentile</th> <th>Median</th> <th>75% Percentile</th> </tr> <tr> <td rowspan="17" style="border-bottom: 1px solid black">74</td> <td rowspan="5">0.5%</td> <td style="border-right: 1px solid black">50/50 Win</td> <td>`{python} round(legendary_obtained_50_50_2['win_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_2['win_mode']`</td> <td>`{python} legendary_obtained_50_50_2['win_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_2['win_median']`</td> <td>`{python} legendary_obtained_50_50_2['win_75_percentile']`</td> </tr> <tr> <td style="border-right: 1px solid black">50/50 Loss</td> <td>`{python} round(legendary_obtained_50_50_2['lose_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_2['lose_mode']`</td> <td>`{python} legendary_obtained_50_50_2['lose_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_2['lose_median']`</td> <td>`{python} legendary_obtained_50_50_2['lose_75_percentile']`</td> </tr> <tr style="border-top: 1px solid black; border-bottom: 1px solid black"> <td style="border-right: 1px solid black">Combined</td> <td>`{python} round(legendary_obtained_50_50_2['combined_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_2['combined_mode']`</td> <td>`{python} legendary_obtained_50_50_2['combined_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_2['combined_median']`</td> <td>`{python} legendary_obtained_50_50_2['combined_75_percentile']`</td> </tr> <tr style="height: 1rem"> <th style="border-right: 1px solid black"></th> <th colspan="5"></th> </tr> <tr style="border-bottom: 1px solid black; font-weight: bold"> <td style="border-right: 1px solid black">50/50 % inc</td> <td>`{python} round(100*legendary_obtained_50_50_2['combined_mean']/legendary_obtained_50_50_2['win_mean']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_2['combined_mode']/legendary_obtained_50_50_2['win_mode']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_2['combined_25_percentile']/legendary_obtained_50_50_2['win_25_percentile']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_2['combined_median']/legendary_obtained_50_50_2['win_median']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_2['combined_75_percentile']/legendary_obtained_50_50_2['win_75_percentile']-100, 4)`</td> </tr> <tr style="height: 1rem"> <th colspan="2" style="border-right: 1px solid black"></th> <th colspan="5"></th> </tr> <tr> <td rowspan="5">1%</td> <td style="border-right: 1px solid black">50/50 Win</td> <td>`{python} round(legendary_obtained_50_50_1['win_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_1['win_mode']`</td> <td>`{python} legendary_obtained_50_50_1['win_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_1['win_median']`</td> <td>`{python} legendary_obtained_50_50_1['win_75_percentile']`</td> </tr> <tr> <td style="border-right: 1px solid black">50/50 Loss</td> <td>`{python} round(legendary_obtained_50_50_1['lose_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_1['lose_mode']`</td> <td>`{python} legendary_obtained_50_50_1['lose_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_1['lose_median']`</td> <td>`{python} legendary_obtained_50_50_1['lose_75_percentile']`</td> </tr> <tr style="border-top: 1px solid black; border-bottom: 1px solid black"> <td style="border-right: 1px solid black">Combined</td> <td>`{python} round(legendary_obtained_50_50_1['combined_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_1['combined_mode']`</td> <td>`{python} legendary_obtained_50_50_1['combined_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_1['combined_median']`</td> <td>`{python} legendary_obtained_50_50_1['combined_75_percentile']`</td> </tr> <tr style="height: 1rem"> <th style="border-right: 1px solid black"></th> <th colspan="5"></th> </tr> <tr style="border-bottom: 1px solid black; font-weight: bold"> <td style="border-right: 1px solid black">50/50 % inc</td> <td>`{python} round(100*legendary_obtained_50_50_1['combined_mean']/legendary_obtained_50_50_1['win_mean']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_1['combined_mode']/legendary_obtained_50_50_1['win_mode']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_1['combined_25_percentile']/legendary_obtained_50_50_1['win_25_percentile']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_1['combined_median']/legendary_obtained_50_50_1['win_median']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_1['combined_75_percentile']/legendary_obtained_50_50_1['win_75_percentile']-100, 4)`</td> </tr> <tr style="height: 1rem"> <th colspan="2" style="border-right: 1px solid black"></th> <th colspan="5"></th> </tr> <tr> <td rowspan="5">2%</td> <td style="border-right: 1px solid black">50/50 Win</td> <td>`{python} round(legendary_obtained_50_50_3['win_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_3['win_mode']`</td> <td>`{python} legendary_obtained_50_50_3['win_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_3['win_median']`</td> <td>`{python} legendary_obtained_50_50_3['win_75_percentile']`</td> </tr> <tr> <td style="border-right: 1px solid black">50/50 Loss</td> <td>`{python} round(legendary_obtained_50_50_3['lose_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_3['lose_mode']`</td> <td>`{python} legendary_obtained_50_50_3['lose_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_3['lose_median']`</td> <td>`{python} legendary_obtained_50_50_3['lose_75_percentile']`</td> </tr> <tr style="border-top: 1px solid black; border-bottom: 1px solid black"> <td style="border-right: 1px solid black">Combined</td> <td>`{python} round(legendary_obtained_50_50_3['combined_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_3['combined_mode']`</td> <td>`{python} legendary_obtained_50_50_3['combined_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_3['combined_median']`</td> <td>`{python} legendary_obtained_50_50_3['combined_75_percentile']`</td> </tr> <tr style="height: 1rem"> <th style="border-right: 1px solid black"></th> <th colspan="5"></th> </tr> <tr style="border-bottom: 1px solid black; font-weight: bold"> <td style="border-right: 1px solid black">50/50 % inc</td> <td>`{python} round(100*legendary_obtained_50_50_3['combined_mean']/legendary_obtained_50_50_3['win_mean']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_3['combined_mode']/legendary_obtained_50_50_3['win_mode']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_3['combined_25_percentile']/legendary_obtained_50_50_3['win_25_percentile']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_3['combined_median']/legendary_obtained_50_50_3['win_median']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_3['combined_75_percentile']/legendary_obtained_50_50_3['win_75_percentile']-100, 4)`</td> </tr> <tr style="height: 1rem"></tr> <tr style="border-bottom: 1px solid black"> <th>Soft Pity</th> <th>$\underline{p}$</th> <th style="border-right: 1px solid black"></th> <th>Mean</th> <th>Modal</th> <th>25% Percentile</th> <th>Median</th> <th>75% Percentile</th> </tr> <tr> <td rowspan="17">64</td> <td rowspan="5">0.5%</td> <td style="border-right: 1px solid black">50/50 Win</td> <td>`{python} round(legendary_obtained_50_50_2_2['win_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_2_2['win_mode']`</td> <td>`{python} legendary_obtained_50_50_2_2['win_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_2_2['win_median']`</td> <td>`{python} legendary_obtained_50_50_2_2['win_75_percentile']`</td> </tr> <tr> <td style="border-right: 1px solid black">50/50 Loss</td> <td>`{python} round(legendary_obtained_50_50_2_2['lose_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_2_2['lose_mode']`</td> <td>`{python} legendary_obtained_50_50_2_2['lose_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_2_2['lose_median']`</td> <td>`{python} legendary_obtained_50_50_2_2['lose_75_percentile']`</td> </tr> <tr style="border-top: 1px solid black; border-bottom: 1px solid black"> <td style="border-right: 1px solid black">Combined</td> <td>`{python} round(legendary_obtained_50_50_2_2['combined_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_2_2['combined_mode']`</td> <td>`{python} legendary_obtained_50_50_2_2['combined_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_2_2['combined_median']`</td> <td>`{python} legendary_obtained_50_50_2_2['combined_75_percentile']`</td> </tr> <tr style="height: 1rem"> <th style="border-right: 1px solid black"></th> <th colspan="5"></th> </tr> <tr style="border-bottom: 1px solid black; font-weight: bold"> <td style="border-right: 1px solid black">50/50 % inc</td> <td>`{python} round(100*legendary_obtained_50_50_2_2['combined_mean']/legendary_obtained_50_50_2_2['win_mean']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_2_2['combined_mode']/legendary_obtained_50_50_2_2['win_mode']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_2_2['combined_25_percentile']/legendary_obtained_50_50_2_2['win_25_percentile']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_2_2['combined_median']/legendary_obtained_50_50_2_2['win_median']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_2_2['combined_75_percentile']/legendary_obtained_50_50_2_2['win_75_percentile']-100, 4)`</td> </tr> <tr style="height: 1rem"> <th colspan="2" style="border-right: 1px solid black"></th> <th colspan="5"></th> </tr> <tr> <td rowspan="5">1%</td> <td style="border-right: 1px solid black">50/50 Win</td> <td>`{python} round(legendary_obtained_50_50_1_2['win_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_1_2['win_mode']`</td> <td>`{python} legendary_obtained_50_50_1_2['win_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_1_2['win_median']`</td> <td>`{python} legendary_obtained_50_50_1_2['win_75_percentile']`</td> </tr> <tr> <td style="border-right: 1px solid black">50/50 Loss</td> <td>`{python} round(legendary_obtained_50_50_1_2['lose_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_1_2['lose_mode']`</td> <td>`{python} legendary_obtained_50_50_1_2['lose_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_1_2['lose_median']`</td> <td>`{python} legendary_obtained_50_50_1_2['lose_75_percentile']`</td> </tr> <tr style="border-top: 1px solid black; border-bottom: 1px solid black"> <td style="border-right: 1px solid black">Combined</td> <td>`{python} round(legendary_obtained_50_50_1_2['combined_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_1_2['combined_mode']`</td> <td>`{python} legendary_obtained_50_50_1_2['combined_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_1_2['combined_median']`</td> <td>`{python} legendary_obtained_50_50_1_2['combined_75_percentile']`</td> </tr> <tr style="height: 1rem"> <th style="border-right: 1px solid black"></th> <th colspan="5"></th> </tr> <tr style="border-bottom: 1px solid black; font-weight: bold"> <td style="border-right: 1px solid black">50/50 % inc</td> <td>`{python} round(100*legendary_obtained_50_50_1_2['combined_mean']/legendary_obtained_50_50_1_2['win_mean']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_1_2['combined_mode']/legendary_obtained_50_50_1_2['win_mode']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_1_2['combined_25_percentile']/legendary_obtained_50_50_1_2['win_25_percentile']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_1_2['combined_median']/legendary_obtained_50_50_1_2['win_median']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_1_2['combined_75_percentile']/legendary_obtained_50_50_1_2['win_75_percentile']-100, 4)`</td> </tr> <tr style="height: 1rem"> <th colspan="2" style="border-right: 1px solid black"></th> <th colspan="5"></th> </tr> <tr> <td rowspan="5">2%</td> <td style="border-right: 1px solid black">50/50 Win</td> <td>`{python} round(legendary_obtained_50_50_3_2['win_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_3_2['win_mode']`</td> <td>`{python} legendary_obtained_50_50_3_2['win_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_3_2['win_median']`</td> <td>`{python} legendary_obtained_50_50_3_2['win_75_percentile']`</td> </tr> <tr> <td style="border-right: 1px solid black">50/50 Loss</td> <td>`{python} round(legendary_obtained_50_50_3_2['lose_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_3_2['lose_mode']`</td> <td>`{python} legendary_obtained_50_50_3_2['lose_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_3_2['lose_median']`</td> <td>`{python} legendary_obtained_50_50_3_2['lose_75_percentile']`</td> </tr> <tr style="border-top: 1px solid black; border-bottom: 1px solid black"> <td style="border-right: 1px solid black">Combined</td> <td>`{python} round(legendary_obtained_50_50_3_2['combined_mean'], 2)`</td> <td>`{python} legendary_obtained_50_50_3_2['combined_mode']`</td> <td>`{python} legendary_obtained_50_50_3_2['combined_25_percentile']`</td> <td>`{python} legendary_obtained_50_50_3_2['combined_median']`</td> <td>`{python} legendary_obtained_50_50_3_2['combined_75_percentile']`</td> </tr> <tr style="height: 1rem"> <th style="border-right: 1px solid black"></th> <th colspan="5"></th> </tr> <tr style="border-bottom: 1px solid black; font-weight: bold"> <td style="border-right: 1px solid black">50/50 % inc</td> <td>`{python} round(100*legendary_obtained_50_50_3_2['combined_mean']/legendary_obtained_50_50_3_2['win_mean']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_3_2['combined_mode']/legendary_obtained_50_50_3_2['win_mode']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_3_2['combined_25_percentile']/legendary_obtained_50_50_3_2['win_25_percentile']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_3_2['combined_median']/legendary_obtained_50_50_3_2['win_median']-100, 4)`</td> <td>`{python} round(100*legendary_obtained_50_50_3_2['combined_75_percentile']/legendary_obtained_50_50_3_2['win_75_percentile']-100, 4)`</td> </tr></table>Descriptive Statistics for the 50/50 System, Hard Pity=89, Soft Pity=74 & 64, $\underline{p}=0.5\%$, $\underline{p}=1\%$ & $\underline{p}=2\%$:::From @tbl-50-50-stats-2, the mean number of pulls needed to get the desired item increases by `{python} 100*legendary_obtained_50_50_2['combined_mean']/legendary_obtained_50_50_2['win_mean']-100`%. Whilst this is not the exact same increase as before, it is very close, and close enough to suggest that this is not a mere coincidence. One can also see that the change in the soft pity has no effect on this either, therefore suppose the soft pity is zero: the conditional probability distribution is linear from 1 to $N$. In this case we can use $\underline{p}$ as the very first probability, when pity is zero, and then conditional probabilities will rise from there to equal 1 at the hard pity of $N-1$. The conditional probability would be:$$P(x|k) = \underline{p} + \frac{1-\underline{p}}{N-1} k$$ The conditional and joint probabilities are shown in @fig-cond-joint-prob-zero-soft-pity. :::{style="display: flex"}::: {style="display: inline-block;"}```{ojs}//| label: fig-cond-joint-prob-zero-soft-pity//| fig-cap: 'Conditional and Joint Probability of Success, P(x|k) & P(x,k)'{ let Prob_data = [] let cond_prob_x_k = 0 let prob_k = 1 let prob_x_k = 0 for (var k=0; k<=99; k++){ cond_prob_x_k = p_ + (1-p_)*k/100 prob_x_k = cond_prob_x_k * prob_k prob_k = prob_k*(1-cond_prob_x_k) Prob_data.push({'Pity': k, 'Cond. Probability': cond_prob_x_k, 'Joint Probability': prob_x_k, 'Probability(k)': prob_k}) } const plot_1 = Plot.plot({ x: { label: 'Pity, (k)', domain: [-2, 102], ticks: d3.ticks(0, 100, 20) }, y: { label: 'Conditional Probability, P(x|k)', domain: [0, 1] }, marks: [ Plot.ruleX(Prob_data, {x: 'Pity', y: 'Cond. Probability', stroke: 'var(--plot-rule-color-1)', strokeWidth: 7}), Plot.tip(Prob_data, Plot.pointerX({x: 'Pity', y: 'Cond. Probability', fill: "var(--bs-body-bg)"})), Plot.frame(), ], width: width, }) return plot_1}{ let Prob_data = [] let cond_prob_x_k = 0 let prob_k = 1 let prob_x_k = 0 for (var k=0; k<=99; k++){ cond_prob_x_k = p_ + (1-p_)*k/100 prob_x_k = cond_prob_x_k * prob_k prob_k = prob_k*(1-cond_prob_x_k) Prob_data.push({'Pity': k, 'Cond. Probability': cond_prob_x_k, 'Joint Probability': prob_x_k, 'Probability(k)': prob_k}) } const plot_2 = Plot.plot({ x: { label: 'Pity, (k)', domain: [-2, 102], ticks: d3.ticks(0, 100, 20) }, y: { label: 'Joint Probability, P(x ∩ k)', domain: [0, 0.1] }, marks: [ Plot.ruleX(Prob_data, {x: 'Pity', y: 'Joint Probability', stroke: 'var(--plot-rule-color-1)', strokeWidth: 7}), Plot.tip(Prob_data, Plot.pointerX({x: 'Pity', y: 'Joint Probability', fill: "var(--bs-body-bg)"})), Plot.frame(), ], width: width, }) return plot_2}{ let Prob_data_sum = [] let cond_prob_x_k = 0 let prob_k = 1 let prob_x_k = 0 let prob_x_k_sum = 0 for (var k=0; k<=99; k++){ cond_prob_x_k = p_ + (1-p_)*k/100 prob_x_k = cond_prob_x_k * prob_k prob_k = prob_k*(1-cond_prob_x_k) Prob_data_sum.push({'Pity': k, 'Probability': prob_x_k_sum}) prob_x_k_sum += prob_x_k } const plot_3 = Plot.plot({ x: { label: 'Pity, (k)', domain: [-2, 102], ticks: d3.ticks(0, 100, 20) }, y: { label: 'Cumulative Probability, P(x ∩ k)', domain: [0, 1] }, marks: [ Plot.ruleX(Prob_data_sum, {x: 'Pity', y: 'Probability', stroke: 'var(--plot-rule-color-1)', strokeWidth: 7}), Plot.tip(Prob_data_sum, Plot.pointerX({x: 'Pity', y: 'Probability', fill: "var(--bs-body-bg)"})), Plot.frame(), Plot.ruleY([0.25], {stroke: 'var(--bs-body-color)', strokeDasharray: '5,5'}), Plot.ruleY([0.5], {stroke: 'var(--bs-body-color)', strokeDasharray: '5,5'}), Plot.ruleY([0.75], {stroke: 'var(--bs-body-color)', strokeDasharray: '5,5'}), ], width: width, }) return plot_3}```:::::: {style="display: inline-block; margin-left: -100%; position: relative; left: 60%; top: -15px"}```{ojs}viewof p_ = Inputs.range( [0.001, 0.1], { step: 0.001, value: 0.001, label: html`1st Prob. <span style="text-decoration: underline">p</span>`, id: 'p_'})```::::::One can see that the initial probability does not have a comparatively large effect as the increases in the probability are linear and therefore larger than the initial probability. This is due to the maximum number of pulls being 90, and if we increased this the changes in probability would be smaller, and the joint probability distribution would be overall flatter. The joint probability here is:$$P(x,k) = P(x|k)P(k) = \begin{cases} \underline{p} \qquad\qquad\qquad\qquad \text{ for } k=0 \\ \left[\underline{p} + \frac{1-\underline{p}}{N-1} k \right] \Pi_{i=0}^{k-1} (1-\underline{p} - \frac{1-\underline{p}}{N-1} i) \quad \text{for } k>0 \end{cases}$$This yields the expected number of pulls before 1 success as: $$\begin{aligned}\bar{n} &= \sum_{n=1}^{N} n p(x|k=n-1) p(k=n-1) \\&= \underline{p}+ \sum_{n=2}^{N} n \left[\underline{p} + \frac{1-\underline{p}}{N-1} (n-1) \right] \left[\Pi_{i=0}^{n-2} (1-\underline{p} - \frac{1-\underline{p}}{N-1} i) \right]\end{aligned}$$The product term can be simplified to:$$\Pi_{i=0}^{n-2} (1-\underline{p} - \frac{1-\underline{p}}{N-1} i) = \Pi_{i=0}^{n-2} (1-\underline{p}) (\frac{N-1-i}{N-1}) = \left(\frac{1-\underline{p}}{N-1}\right)^{n-1} \frac{(N-1)!}{(N-n)!}$$Substituting in a simplifying:$$\begin{aligned}\bar{n} &= \underline{p} + \sum_{n=2}^{N} n \left[\underline{p} + \frac{1-\underline{p}}{N-1} (n-1) \right] \left(\frac{1-\underline{p}}{N-1}\right)^{n-1} \frac{(N-1)!}{(N-n)!}\\\end{aligned}$$We could try to compare this with the 50/50 system, but this is also a bit of a mess, so let's think about this another way... Let $F_r \sim H(r, p)$ be the number of failures before the $r^{th}$ success in a Bernoulli process with probability of success $p$. Also, let $p$ be dependant on the pity (number of failures before seeing success). And let $Y_i \sim G(p)$ where $Y_i$ represents the number of pulls before the $i^{th}$ success. Then $Y_i$ also represents the number of pulls between the $i-1^{th}$ and $i^{th}$ success. Thus, we can think of $F_r$ as the $r^{th}$ sum of $Y_i$:$$F_r = Y_1 + Y_2 + \cdots + Y_r$$In the case here of a 50/50 with a guarantee after a 50/50 loss, then is one losses the 50/50 $r=2$, and if you win the 50/50 $r=1$. Due to this sum, it is easier to find the expected value as it is just the sum of these variables:$$E[F_r] = \cases{ E[Y_1] & \text{if you win 50/50} \\ E[Y_1] + E[Y_2] & \text{if you lose 50/50}}$$As $Y_1$ and $Y_2$ follow the same distribution, then the expected number of pulls before 2 successes is twice as many as the expected number of pulls before 1 success. This is the same as under a negative binomial distribution, in which the probabilities are constant. Importantly, I did not specify the conditional probability distribution here (which is $G$), therefore this applies to any valid distribution, including those above with varying starting probabilities and levels of soft pity and hard pity. ## Optimal PricingTypically, as economists, when face with uncertain outcomes, we use the expected values to inform us of what the price *should* be. In the case of gacha, we don't want to focus on the expected value of a single pull, but rather the expected number of pulls one would need to obtain the desired items. This is easy as i've already calculated this. What is a key question is whether probabilistic pricing can be an optimal pricing strategy, and under what conditions. The first model shown here will have the pity system (both soft and hard), but not contain a 50/50. The second model will introduce the 50/50 back in. The third model (if I can be bothered) will contain the spark system. Preliminary analysis shows that a probabilistic pricing strategy is a form of price discrimination (perhaps even perfect). Compare gacha to a fixed price system: with a fixed price, those that are willing to pay this price will do so, and anyone who is not willing to pay this price, will not purchase the item. With a gacha system, those that are willing to pay the expected cost will still do so, but those that are not willing to pay the expected price, may still be willing to pay something fora chance at getting the item. This is how a gacha system can capture a larger proportion of the consumer budgets (this ignores possible risk aversion but the logic is sound).In usual tradition, to find profit we need both costs and revenues. Game development costs are typically upfront and fixed, and subsequent costs after release are things like server costs and maintenance. These variable costs are likely difficult to parse out, but for modelling purposes let us assume some fixed marginal cost per successful purchase. We could add a smaller cost per pull also for the marginal server sots used to generate the pull, but this would be an aside. The revenue clearly depends on the number of pulls, but also on the price of each pull, the probabilities of success, and the value of the items. For optimal pricing we want to compare a gacha system to a fixed price system, where this fixed price is equal to the cost of the expected number of pulls for success under the gacha system. I will assume that the firm is a monopolist, as they are the only provider of a game, of which the contents are sufficiently differentiated. Gacha games do have competition from similar games (both gacha and non-gacha), and this will affect the price that they can set, but (i think) the effect will be equal for either price system, so can be ignored. To calculate the revenue under the gacha system, we need to categorise the player-base depending on their willingness to pay. Call the set of active players $A$, with size $\bar{A}$, where player $a$ has valuations for the current available item denoted as $v_a$. The valuations are in terms of the number of pulls they are willing to do for one of the item. Players' willingness to pay is dependant on their ability to pay, so denote each player's budget as:$$b_a = z_a + w_a$$where $w$ is the in game currency gained from playing the game, and $z$ is the real money converted to in-game currency used to purchase pulls. To be consistent with the valuations above, both $w$ and $z$ are in terms of number of pulls. The in-game and real-money currency amounts can be inferred from this using appropriate exchange rates (i.e. by multiplying by the in-game price of a single pull). We can use these budgets as our categories: players with zero real money budget $(z_a=0)$ are known as free-to-play (FTP); players with a positive real money budget $(z_a>0)$ are known as pay-to-play (PTP). We can further categorise the PTP players into five categories: those with a budget less than the expected number of pulls for success $(z_a < E[Y])$; those with a budget larger than the expected number of pulls for success, but lower than the hard pity $(E[Y]\le z_a<Y_{max})$; those with a budget large enough to guarantee a success but less than two expected successes $(Y_{max}\le z_a < 2E[Y])$, those with a budget larger than two expected successes, but not large enough to guarantee two successes $(2E[Y]\le z_a < 2Y_{max})$; and those with a budget large enough to guarantee two success and more $(z_a > 2Y_{max})$. These categories are often labelled using aquatic animals, therefore refer to @tbl-budget-categories for this information. | Category | Budget | Description ||----------|--------|-------------|| FTP | $z_a=0$ | Free-to-play || Minnow | $0 < z_a < E[Y]$ | Small budget, less than expected number of pulls for success || Dolphin 1| $E[Y] \le z_a < Y_{max}$ | Budget large enough to expect a success but not guarantee a success || Dolphin 2| $Y_{max} \le z_a < 2E[Y]$ | Budget large enough to guarantee a success but less than two expected successes || Whale | $2E[Y] \le z_a < 2Y_{max}$ | Budget larger than two expected successes, but not large enough to guarantee two successes || Giga-Whale| $2Y_{max} \le z_a$ | Budget large enough to guarantee two success and more |: Budget Categories for Players {#tbl-budget-categories}The reason for categorizing the players like this, is because their budgets wll affect the number of pulls they can do, and therefore affecting the probability of success $p(x|N)$. It can also show us which players cannot afford the item if it had a fixed price: if the fixed price is equal to the expected number of pulls for success $E[Y]$, then players cannot afford the item if $b_a < E[Y]$. For example, $w_a=\frac{1}{2}E[Y]$ then players need at least $z_a=\frac{1}{2}E[Y]$ to afford the item. Under the Gacha system, players with a budget lower than the expected value of the item still have a chance to obtain the item, but under a fixed price system they do not. We can summarise the relevant variables for each player $a$ as $(v_a, w_a, z_a)$. The revenue for the firm is then the sum of the revenue from each player, which means that we need to specify distributions for these variables. As the players' values of the items are subjective and unknown to the firm, the simplest assumption we can make is that they are uniformly distributed between 0 and hard pity ($N=100$). The budget variables are observable to the firm, but the observation is not realised until purchase. As the firm must set the price before the release of the game, they must make assumptions about the distribution of these variables. ### Real Money BudgetThe distribution of real money budget must not be uniform, as there is a large proportion of players for which $z_a=0$ and the density of players should decrease as the budget increases. Therefore, assume that the distribution of real money budget is a Pareto Type II distribution (Lomax) with scale parameter $\lambda$ and shape parameter $\alpha$. This distribution is defined as:$$f(z) = \frac{\alpha \lambda^\alpha}{(z+\lambda)^{\alpha+1}} \quad \alpha, \lambda > 0$$where $z \ge 0$. The proportion of F2P players is $f(0) = \frac{\alpha}{\lambda}$, so if we wish to set the proportion of F2P players as a quarter of the player-base, then we should set $\alpha = \frac{\lambda}{4}$. As I have not data to inform me of appropriate values, i'll make an 'educated' guess and set $\lambda=8$ and $\alpha=2$. Perhaps I can use [@RePEc:tse:wpaper:128768] to inform me of better values, but they do use a different functional form of the Pareto distribution. Using these values, the density function is:$$f(z) = \frac{2*8^2}{(z+8)^{3}} = 128(z+8)^{-3}$$As we want the distibution to be between $[0, 100]$, we need to normalise this distribution, by dividing by $F(100)$. We will denote this normalised distribution as $Lomax_{100}(\lambda, \alpha)$.This gives us enough informatinon to summise the distribution for the in-game budget, but for ease let us assume that the players' valuations are independant of their real-money budgets (maybe not realistic but definitely much easier to deal with). ### In-game CurrencyThe simplest assumption is that the in-game budget is uniformly distributed between some minimum and maximum values for the in-game currency $[w_{min}, w_{max}]$, but this would not likely be appropriate as the distribution does depend on decisions made by the players. We can assume that the firm knows this minimum and maximum in-game currency, and for simplicity assume the players know this too (in reality they don't but can likely make a reasonable estimate). Assume that earning in-game currency has an fixed marginal cost of $c$ per pull (likely actually increasing but oh well), which is effectively the combined costs of effort and time (opportunity cost), minus any benefits as earning in-game currency often means playing the game, which is (hopefully) fun. This effectively reduces the valuation of the item by adding a cost based on how much in-game currency you need to earn. This is what gives the incentive to use real-currency as is is, in a sense, cheaper, as it does not reduces your valuation.Let player $a$ have a valuation of the item of $v_a$, and a real-money budget of $z_a$, then the decision they need to make is how much in-game currency $w_a$ they want to earn. The optimal choice here is to earn no more than how much they value the item, above their existing real-money budget, which implies (assuming no discounting or assuming it is accounted for in the valuations):$$b_a = z_a + w_a = v_a - c\hat{w}_a \quad \text{where } \hat{w}_a = w_a - w_{min}$$Substituting in for $w_a$:$$b_a = z_a + w_{min} + \hat{w}_a = v_a - c\hat{w}_a$$$$\Rightarrow \hat{w}_a = \frac{v_a - z_a - w_{min}}{1 + c}$$If we assumed a convex marginal cost of earning in-game currency, then the budget would be:$$b_a = z_a + w_{min} + \hat{w}_a = v_a - c\hat{w}_a^2$$$$c\hat{w}_a^2 + \hat{w}_a - (v_a - z_a - w_{min}) = 0$$This would imply that the in-game currency should be equal to:$$\Rightarrow \hat{w}_a = \frac{-1 + \sqrt{1 + 4c(v_a - z_a - w_{min})}}{2c}$$I am going to assume the linear case for simplicity from now on. Also, it should be noted that the above equation should be within the possible bounds for $w$:$$\hat{w}_a = \min\left(\frac{v_a - z_a - w_{min}}{1 + c}, \hat{w}_{max}\right)\quad \text{where } \hat{w}_{max} = w_{max} - w_{min}$$@fig-in-game-currency-distributions shows this in-game currency budget as a function of the real money budget, and one can move the valuation to show the in-game currency budget increase with it. The top plot shows the negative relationship between the two budgets, and the bounds of the in-game currency budget. The density of the distribution at the determined value of $v_a$ is shown in the bottom plot. (I might make a 3d plot of this)```{ojs}viewof v_a_fig_igcd = Inputs.range( [0, 100], { step: 1, value: 70, label: 'Valuation of Item, v_a', id: 'v_a'})viewof c_fig_igcd_lin = Inputs.range( [0, 1], { step: 0.01, value: 0.1, label: 'In-Game Currency Linear Effort cost, c', id: 'c_lin'})viewof c_fig_igcd_quad = Inputs.range( [0, 0.2], { step: 0.001, value: 0.01, label: 'In-Game Currency Convex Effort cost, c', id: 'c_quad'})``````{ojs}//| label: fig-in-game-currency-distributions//| fig-cap: 'Distributions of In-Game Currency Budget'{ const w_min = 20 const w_max = 70 const w_max_hat = (w_max - w_min) const c_lin = c_fig_igcd_lin const c_quad = c_fig_igcd_quad const w_a_data = d3.range(0, 100, 0.01).map(z => ({z: z, w: w_min + Math.max(Math.min((v_a_fig_igcd - z - w_min)/(1 + c_lin), w_max_hat), 0)})) const w_a_convex_data = d3.range(0, 100, 0.01).map(z => ({z: z, w: w_min + Math.max(Math.min((-1+Math.sqrt(1+4*c_quad*(v_a_fig_igcd - z - w_min)))/(2*c_quad), w_max_hat), 0)})) const plot_1 = Plot.plot({ x: { label: 'Real Money Budget, z', domain: [0, 100], }, y: { label: 'In-Game Currency, w', domain: [0, 80], }, color: { legend: true, domain: ['w', 'w_convex'], range: ['var(--plot-rule-color-1)', 'var(--plot-rule-color-2)'] }, marks: [ Plot.line(w_a_data, {x: 'z', y: 'w', stroke: 'var(--plot-rule-color-1)'}), Plot.line(w_a_convex_data, {x: 'z', y: 'w', stroke: 'var(--plot-rule-color-2)'}), Plot.frame() ], width: width }) return html`${plot_1}`}```The above graph basically just shows that there is very little difference between the linear and convex cases, except in the magnitude of the cost. The convex case will increase the cost to a level where the maximum in-game currency won't be reached. Whilst the convex case may be more realisitic, the linear case is much easier to deal with. So, for simplicity, I will use the linear case going forward.### Distribution of VariablesI will start with these 2 more simple distributions, as they are convenient, but it may be more accurate for the valuations to follow distributions similar to the Pareto distribution, or some truncated normal distribution. To summarise:$$v_a \sim U(0, N) \quad w_a \sim G(w_{min}, w_{max}, \lambda, \alpha, c) \quad z_a \sim Lomax(\lambda, \alpha)$$Further, these distributions are all for continuous random variables, whereas the measurement scale for the variables is the number of pulls, which is discrete. As the continuous distributions are easier to deal with, I will overlook this as the variables represent conversions from real-money into a discrete number of pulls, so we can think of them as continuous, and understand that a value of 15.8 pulls, for example, is equivalent to $15.8 \times \text{price of a single pull}$ in real money. One should also note that a player could have a non-integer valuation but the realisiation of this (in terms of number of pulls desired) would need to be rounded down to the nearest integer. To picture this, these distributions are shown in @fig-budget-distributions. The distribution for the in-game currency is descriptive of the real distribution, which is not well defined but whos shape was found through simulation. The actual distribution found through 10000 simulations is shown in the appendix. ```{ojs}//| label: fig-budget-distributions//| fig-cap: 'Distributions of Values and Budget Variables'{ const w_min = 20 const w_max = 70 const F_100 = 1 - (1 + 100/8)**(-2) const lomax = (z, lambda, alpha) => (alpha * lambda**alpha / (z + lambda)**(alpha + 1)) / F_100 const lomax_data = d3.range(0, 100, 0.01).map(z => ({z: z, density: lomax(z, 8, 2)})) const plot_1 = Plot.plot({ x: { label: 'Number of Pulls', domain: [0, 100], }, y: { label: 'Density', domain: [0, 0.35], }, color: { legend: true, domain: ['z', 'v', 'w'], range: ['var(--plot-rule-color-1)', 'var(--plot-rule-color-2)', 'var(--plot-rule-color-3)'] }, marks: [ Plot.line(lomax_data, {x: 'z', y: 'density', stroke: 'var(--plot-rule-color-1)'}), Plot.ruleY([0.01], {stroke: 'var(--plot-rule-color-2)'}), Plot.line([[w_min, 0.53/(w_max-w_min)], [w_max, 0.53/(w_max-w_min)]], {stroke: 'var(--plot-rule-color-3)'}), Plot.line([[w_min, 0], [w_min, 0.28]], {stroke: 'var(--plot-rule-color-3)'}), Plot.line([[w_max, 0], [w_max, 0.19]], {stroke: 'var(--plot-rule-color-3)'}), Plot.frame() ], width: width }) return plot_1}{ const w_min = 20 const w_max = 70 const F_100 = 1 - (1 + 100/8)**(-2) const cum_lomax = (z, lambda, alpha) => (1 - (1 + z/lambda)**(-alpha)) / F_100 const cum_w = function(w) { if (w<w_min) { return 0 } else if (w>w_max) { return 1 } else { return 0.53 /(w_max-w_min) * (w-w_min) + 0.28 } } const cum_lomax_data = d3.range(0, 100, 0.01).map(z => ({z: z, density: cum_lomax(z, 8, 2)})) const cum_v_data = d3.range(0, 100, 0.01).map(v => ({v: v, density: v/100})) const cum_w_data = d3.range(0, 100, 0.01).map(w => ({w: w, density: cum_w(w)})) const plot_2 = Plot.plot({ x: { label: 'Number of Pulls', domain: [0, 100], }, y: { label: 'Cumulative Density', domain: [0, 1], }, color: { legend: true, domain: ['z', 'v', 'w'], range: ['var(--plot-rule-color-1)', 'var(--plot-rule-color-2)', 'var(--plot-rule-color-3)'] }, marks: [ Plot.line(cum_lomax_data, {x: 'z', y: 'density', stroke: 'var(--plot-rule-color-1)'}), Plot.line(cum_v_data, {x: 'v', y: 'density', stroke: 'var(--plot-rule-color-2)'}), Plot.line(cum_w_data, {x: 'w', y: 'density', stroke: 'var(--plot-rule-color-3)'}), Plot.ruleY([0.25, 0.5, 0.75], {stroke: 'var(--bs-body-color)', strokeDasharray: '5,5'}), Plot.crosshairX(cum_lomax_data, {x: 'z', y: 'density', textFill: 'var(--plot-rule-color-1)', textStroke: 'var(--bs-body-bg)', ruleStroke: 'var(--plot-rule-color-1)'}), Plot.crosshairX(cum_v_data, {x: 'v', y: 'density', textFill: 'var(--plot-rule-color-2)', textStroke: 'var(--bs-body-bg)', ruleStroke: 'var(--plot-rule-color-2)'}), Plot.crosshairX(cum_w_data, {x: 'w', y: 'density', textFill: 'var(--plot-rule-color-3)', textStroke: 'var(--bs-body-bg)', ruleStroke: 'var(--plot-rule-color-3)'}), Plot.frame() ], width: width }) return plot_2}```One can see from this paramaterisation, that the median real money budget is 3.3 pulls whilst the median in-game currency budget is 41 pulls. I set the median valuation as 50 pulls, which means that the median player would not value the item more than the in-game currency that they could earn. Below is have combined the two budgets into the full budget. This was found by simulation again and then stylised in the below diagram. The valuations and cost of earning in-game currency are also shown. ```{ojs}//| label: fig-full-budget-and-valuation-distribution//| fig-cap: 'Distribution of full budget and valuation'{ const w_min = 20 const w_max = 70 // b = z + w_min + w_hat = v - c*w_hat // Betweem w_min and w_max lomax + uniform (around v=0.01) // After w_max, steaper lomax const lomax = (z, lambda, alpha) => (alpha * lambda**alpha / (z + lambda)**(alpha + 1)) const lomax_data = [] for (let z=0; z<100; z+=0.01) { if (z<w_min) { lomax_data.push({z: z, density: 0}) } else if (z<w_max) { lomax_data.push({z: z, density: lomax(z-w_min, 2.5, 0.09) + 0.01}) } else { lomax_data.push({z: z, density: lomax(z-w_max, 1.5, 0.09)}) } } const plot = Plot.plot({ x: { label: 'Number of Pulls', domain: [0, 100], }, y: { label: 'Density', domain: [0, 0.1], }, color: { legend: true, domain: ['b', 'v'], range: ['var(--plot-rule-color-1)', 'var(--plot-rule-color-2)'] }, marks: [ Plot.line(lomax_data, {x: 'z', y: 'density', stroke: 'var(--plot-rule-color-1)'}), Plot.ruleY([0.01], {stroke: 'var(--plot-rule-color-2)'}), // Plot.line([[w_min, 0.53/(w_max-w_min)], [w_max, 0.53/(w_max-w_min)]], {stroke: 'var(--plot-rule-color-3)'}), // Plot.line([[w_min, 0], [w_min, 0.32]], {stroke: 'var(--plot-rule-color-3)'}), // Plot.line([[w_max, 0], [w_max, 0.15]], {stroke: 'var(--plot-rule-color-3)'}), Plot.frame() ], width: width }) return plot}```@fig-full-budget-and-valuation-distribution shows two spikes in density around both the minimum in-game currency and the maximum in-game currency. The first spike can be characterised as the proportion of players that have a real-money budget at least as large as their valuation of the item, and therefore have no need for any further in-game currency above the minimum. The second spike can be characterised as the proportion of players who, not only have a real-money budget less than their valuation of the item, but sufficiently less such that they need at least the maximum available in-game currency to be willing to pay their valuation of the item. After each of these spikes, the density fall in a convex manner, but at varying rates. After the first sprike, the density falls slower and converges to a value close to the density of the valuations (rememebr that this is uniform). After the second spike, the density falls much faster and seems to converge towards zero (which is necessary as success is guaranteed after 100 pulls, therefore it makes no sense to have a budget larger than that). ### Fixed PriceTo illustrate a fixed price system, assume the price of the item is $q$ pulls. This would mean that an mean of $(1-q)\%$ of the consumers would value the item more than the price, and $q\%$ would value it less than the price. Therefore, the firms revenue would be the sum of the difference between the price and the in-game currency budget for the players that value the item more than the price, excluding those who are not willing to pay real money to make up the difference:$$\Pi_{\text{Fixed}} = \sum_{a \in A} \begin{cases} q - w_a & \text{if } v_a \ge q \text{ and } z_a \ge q - w_a \\ 0 & \text{otherwise}\end{cases}$$The higher the firm sets the price, the more money they make per player, but fewr players will be willing to pay. Iff they lower teh price below the maximum -ingame currency then they will be sell to some players effectively for free, but this will increase the number of players willing to pay. If the proportion of players that have the maximum in-game currency is high then the firm will never want to set the price below this amount. Given the discontinuous distributions, finding a closed form solution to the price is difficult. What one can do is take the previous simulations done, and calculate the revenue for each player at different prices to find a maximum. The result of this is a profit maximising price of 66 pulls, which averages a revenue of 0.73 pulls per player. ### Gacha Price - Consumer decision makingThis may be the actual approach I need to take to get a profit function, as just looking at the distribution will not get me there. This is also as the distributions don't tell us about the acual pairs of budgets and valuations each consumer has, which is the actually important thing when making the decision to pull. The consumer's decision will be based on how likely they think they are to win the item, how large their budget is, and how much they value the item. Using the equations found above we can construct the expected utility function for each player. First, define the utilites for success and failure:$$U_{Gacha}(success=True) = v_a + u_z(z_a - \Delta z_a) + u_w(w_a -\Delta w_a) - c(w_a) \atopU_{Gacha}(success=False) = u_z(z_a - \Delta z_a) + u_w(w_a -\Delta w_a) - c(w_a)$$where $\Delta z_a$ and $\Delta w_a$ are the real-money and in-game currency budgets spent on the pulls; $u_z(\cdot)$ and $u_w(\cdot)$ are the utilities of the remaining corresponding budgets; and $c(w_a)$ is the cost of earning in-game currency. Here we assume that thte utility of the item is their valuation of it, and we are assuming that the in-game currency and real-money currency will have some utility function based on possible alternative expenditure (this will be made clearing and more explicit when future items are introduced). The expected utility for each player (where $x$ is success) is then:$$E(U_{Gacha}) = Pr(x|b_a) \times v_a + u_z(z_a - E(\Delta z_a)) + u_w(w_a - E(\Delta w_a)) - c(\hat{w}_a)$$Here, the probability of success is conditional on the budget, and multiplying this the valuation gives the expected valuation of the item. This probability of success is:$$P(x|b_a) = \sum_{k=0}^{b_a-1} P(x\cap k|b_a)$$The expected utility will then be this expected valuation plus the utility of the expected remaining budgets, minus the expected cost of earning in-game currency. $E(\Delta z_a)$ and $E(\Delta w_a)$ are the expected real-money and in-game currency budgets spent on the pulls, respectively, which will depend on the probability distribution for success. It is important to state at some point that we will assume that in-game currency will be used before real-money currency, therefore $E(\Delta w_a) \ge E(\Delta z_a)$ and $\Delta w_a \ge \Delta z_a$. I will also state that the decision to earn in-game currency is based on their valuation of the item, not how much they expect to need as they can always earn more in-game currency if they want to. The expected number of pulls to get 1 success $\bar{n}$, conditional on a budget of $b_a$ pulls (the densities of which have been shown previously in @fig-joint-density-n), is:$$E(\bar{n}|x, b_a) = 1 + \frac{1}{\sum_{j=0}^{b_a-1} P(x \cap j|b_a)}\sum_{i=0}^{k}k P(x \cap i|b_a)$$where $P(x \cap k|b_a)$ is given by @eq-joint-prob-n. This expected number of pulls for 1 success determines the expected budget used on the pulls. As the in-game currency is used first we can state that:$$\text{if } E(\bar{n}|x, b_a) \lt w_a \text{ then } E(\Delta w_a) = E(\bar{n}|x, b_a) \text{ and } E(\Delta z_a) = 0$$and$$\text{if } E(\bar{n}|x, b_a) \ge w_a \text{ then } E(\Delta w_a) = w_a \text{ and } E(\Delta z_a) = E(\bar{n}|x, b_a) - w_a$$Therefore, the expected utility function can be written as:$$E(U_{Gacha}) =\begin{cases} v_a \sum_{k=0}^{b_a-1} P(x\cap k|b_a) + u_z(z_a) + u_w(w_a - E(\bar{n}|x, b_a)) - c(\hat{w}_a) &\text{if } E(\bar{n}|x, b_a) \lt w_a \\ v_a \sum_{k=0}^{b_a-1} P(x\cap k|b_a) + u_z(z_a + w_a - E(\bar{n}|x, b_a)) + u_w(0) - c(\hat{w}_a) & \text{if } E(\bar{n}|x, b_a) \ge w_a\end{cases}$$What can be seen is that if the utility function is linear then both of these equation are the same, because if your utility is the summation of your existing budgets, then the breakdown of these doesn't matter at all. However, if these utility functions are different, implying that the in-game currency budget and the real-money currency budget have different uses, then the breakdown does matter. In most? Gacha games the purchasable currency (which converts into in-game currency for pulls) can be used to purchase other secondary items, such as cosmetics, and there are oten purchasble items which do not use the in-game currency, such as battlepasses, so the assumption that the utility functions for each budget should be different is likely an accurate one. However, whether this is a significant thing is hard to tell and may lead to unneccesary complexity when I am only assessing the gacha system itself and not secondary revenue mthods. Therefore, let us ignore this assumption and use a utility function $u$ that can be used for the entire budget, by treating both budgets as perfectly substitutable. Meaning that:$$E(U_{Gacha}) = v_a \sum_{k=0}^{b_a-1} P(x\cap k|b_a) + u(z_a + w_a - E(\bar{n}|x, b_a)) - c(\hat{w}_a)$$Before evaluating this, we should consider the utility of not partaking in the Gacha system at all, and thereby not spending any of the budget. This is needed as obtaining the item is not always guaranteed (depending on the size of the budget), and therefore the player may be better off not spending any of their budget at all. The utility of not partaking in the Gacha system is simply the utility of the budget, which is:$$E(U_{No\space Gacha}) = u(z_a + w_a) - c(\hat{w}_a)$$This means that the player will do the Gacha if:$$E(U_{Gacha}) \ge E(U_{No\space Gacha}) $$$$\Rightarrow v_a \sum_{i=0}^{b_a-1} P(x\cap i|b_a) \ge u(z_a + w_a) - u(z_a + w_a - E(\bar{n}|x, b_a))$$In the special case that the utility of the budget is linear, then:$$v_a \sum_{k=0}^{b_a-1} P(x\cap k|b_a) \ge u(E(\bar{n}|x, b_a))$$This mearly implies that the expected utility of earning the item would need to greater than the expected cost. This condition is unlikely to be binding in the case where we only have a single item, as the budget has no other use other than Gacha, therefore the utility of the remaining budget will be zero. However, all Gacha game are 'live service' game, which means that they constantly adding new items, therefore the remaining budget will always have utility. Adding future items into this decision making process will be discussed in the next section. The solution for a case with no future items is to spend the entire budget as there is no utility for any remaining budget. We can therefore simplify the expected utility function to:$$E(U_{Gacha}) = v_a \sum_{k=0}^{b_a-1} P(x\cap k|b_a) - c(\hat{w}_a)$$As the budget is already determined, as long as this expected utility function is positive, they will spend as much of it as possible until they either get the item or run out of currency. From the perspective of the firm, Gacha ensures that nearly every player will engage in the Gacha system, as to not engage in it requires a very large cost of earning the in-game currency (assuming they will want it), and a really low valuation of the item. These customers that still don't engage in the gacha system, still wouldn't buy the item at a fixed price, as their valuations are low and, given that their in-game currecny budget is high, their real money budget is also low, so the revenue loss from these customers would likely be trivial. In order to maximise profits, the firm can alter the probabilities. Note that the firm does not care about whether the player is successful or not, as the cost to them of this is negligible, but they actually care about how many pulls the player makes. Therefore, the firm will want to reduce the probability of success (conditional on the pity being less than soft pity) to increaase the expected number of pulls. This will have the effect of also reducing the player's probability of success conditional on their budget, reducing their expected utility, thus making this a constriant. The probability of success can be increased by lowering both soft pity and hard pity, but may reduce some potential revenues. We can define the conditional probability distribution with equations using 2 straight lines, with a discontinuity at the soft pity. Below soft pity, the conditional probability is flat at a rate of $\underline{p}$, and above the soft pity the probability rises to 1 at hard pity. Thus:$$P(x|k) = \begin{cases} \underline{p} & \text{if } k < k_{soft} \\ \underline{p} + \frac{(1-\underline{p})}{k_{hard}+1-k_{soft}}(k+1-k_{soft}) & \text{if } k_{soft} \le k \le k_{hard} \end{cases}$$This is the general form of @eq-cond-prob. However, we want the version conditional on the budget, which is easy it essentially is making the probability zero above the budget:$$P(x|k, b_a) = \begin{cases} \underline{p} & \text{if } k < k_{soft} \text{ \& } k < b_a \\ \underline{p} + \frac{(1-\underline{p})}{k_{hard}+1-k_{soft}}(k+1-k_{soft}) & \text{if } k_{soft} \le k \le k_{hard} \text{ \& } k + 1 < b_a \\ 0 & \text{if } k \ge b_a\end{cases}$$We can then use this to find the general form for the joint probability of success and pity, conditional on the budget, which will be needed to find the expected utility function:$$P(x \cap k|b_a) = \begin{cases} \underline{p}(1-\underline{p})^k & \text{if } k < k_{soft} \text{ \& } k < b_a \\[\underline{p} + \frac{(1-\underline{p})}{k_{hard}+1-k_{soft}}(k+1-k_{soft})](1-\underline{p})^{k_{soft}}\Pi^k_{i=k_{soft}}(\frac{k_{hard}+1-i}{k_{hard}+1-k_{soft}}) & \text{if } k_{soft} \le k \le k_{hard} \text{ \& } k < b_a \\ 0 & \text{if } k \ge b_a\end{cases}$$We can simplify the product at the end of the 2nd term with some evaluation, given that the denominator is a constant and the numerator is in the for $A-x$ where $x$ is an integer:$$\Pi^k_{i=k_{soft}}(\frac{k_{hard}+1-i}{k_{hard}+1-k_{soft}}) =\frac{\Pi^k_{i=k_{soft}}(k_{hard}+1-i)}{\Pi^k_{i=k_{soft}}(k_{hard}+1-k_{soft})}$$The denominator can be evaluated easily:$$\Pi^k_{i=k_{soft}}(k_{hard}+1-k_{soft}) = (k_{hard}+1-k_{soft})^{k-k_{soft}+1}$$The numerator is a bit more tricky, but can be evaluated as:$$\Pi^k_{i=k_{soft}}(k_{hard}+1-i) = (k_{hard}+1-k_{soft})(k_{hard}-k_{soft})(k_{hard}-1-k_{soft})\cdots(k_{hard}+1-k) = \frac{(k_{hard}+1-k_{soft})!}{(k_{hard}+1-k)!}$$Combining these we can get:$$\Pi^k_{i=k_{soft}}(\frac{k_{hard}+1-i}{k_{hard}+1-k_{soft}}) = \frac{(k_{hard}+1-k_{soft})!}{(k_{hard}+1-k)!(k_{hard}+1-k)^{k-k_{soft}+1}}$$Substituting this in:$$P(x \cap k|b_a) = \begin{cases} \underline{p}(1-\underline{p})^k & \text{if } k < k_{soft} \text{ \& } k < b_a \\[\underline{p} + \frac{(1-\underline{p})}{k_{hard}+1-k_{soft}}(k+1-k_{soft})]\frac{(k_{hard}+1-k_{soft})!(1-\underline{p})^{k_{soft}}}{(k_{hard}+1-k)!(k_{hard}+1-k)^{k-k_{soft}+1}} & \text{if } k_{soft} \le k \le k_{hard} \text{ \& } k < b_a \\ 0 & \text{if } k \ge b_a\end{cases}$$To find the expected utility we need to sum this probability for all integer values of $k$ from 0 to $b_a-1$. To make this easier, start with the case where $b_a \le k_{soft}$, which is the case where the player has a budget less than soft pity. In this case, we can just sum the first term:$$\sum_{k=0}^{b_a-1} P(x \cap k|b_a < k_{soft}) = \underline{p}\sum_{k=0}^{b_a-1} (1-\underline{p})^k = \underline{p}\frac{1-(1-\underline{p})^{b_a}}{\underline{p}} $$$$\sum_{k=0}^{b_a-1} P(x \cap k|b_a < k_{soft}) = 1-(1-\underline{p})^{b_a}$$This probability is equal to 1 - the probability of never winning, which is the probability of winning at least once, which is what the sum of all of these probabilities means. With this knowledge it becomes easier for the case where the budget is more than the soft pity, as we only have to subtract the probability of never winning (conditional on the budget) from 1. $$\sum_{k=0}^{b_a-1} P(x \cap k|b_a > k_{soft}) = 1-\frac{(k_{hard}+1-k_{soft})!(1-\underline{p})^{k_{soft}}}{(k_{hard}+1-b_a)!(k_{hard}+1-b_a)^{k-k_{soft}+1}}$$### SimplifyingAt this point, I've realised I may need to take the Gacha system all the way to its simplest form, which means removing any pity system and therefore allowing for the distribution to have an infinite domain. I've made this decision as I keep getting rather intractable maths, which clearly has solutions (I think finite, and even single solutions), but none of them are particularly closed form. For example, a probability with 2 factorials in it is difficult to work with further. If we define the conditional probability of success, with no pity system, as:$$P(x|k) = P(x) = \underline{p}$$Then the joint probability of success and pity, conditional on the budget, is:$$P(x \cap k|b_a) = \begin{cases} \underline{p}(1-\underline{p})^k & \text{if } k < b_a \\ 0 & \text{if } k \ge b_a\end{cases}$$Using the simplified expected utility function, with no future items and linear utility, we can find the expected utility function with no pity system:$$E(U_{Gacha}) = v_a \sum_{k=0}^{b_a-1} P(x\cap k|b_a) - c(\hat{w}_a)$$$$\Rightarrow E(U_{Gacha}) = v_a \underline{p}\sum_{k=0}^{b_a-1} (1-\underline{p})^k - c(\hat{w}_a)$$$$\Rightarrow E(U_{Gacha}) = v_a \underline{p}\frac{1-(1-\underline{p})^{b_a}}{\underline{p}} - c(\hat{w}_a)$$$$\Rightarrow E(U_{Gacha}) = v_a(1-(1-\underline{p})^{b_a}) - c(\hat{w}_a)$$Like before, this equation shows that the expected utility for the consumer is the valuation of the item multiplied by the probability of winning it (which is 1 minus the probability of never winning it), minus the cost of earning in-game currency. We can view this as a participation constraint, used by the firm at balance the number of players willing to engage in the gacha system against minimising the probability of winning.Given that the set of players is $A$, with a size of $\alpha$, we can denote the set of players willing to engage in the gacha system as $A_{Gacha}$, with a size of $\alpha_{Gacha}$. This set contains all players which satisfy the participation constraint, and therefore have a positive expected utility. As the probability of success increases, this set will increase, thereby increasing the number of consumers, but reducing the expected revenue per consumer, and vice versa. Recall that the cost function for in-game currency is assumed proportional to the amount of in-game currency earned, at a rate of $c\in (0, 1)$, and that the optimal level of in-game currency is given by:$$\hat{w_a} = max\bigg(min\bigg(\frac{v_a - z_a - w_{min}}{1+c}, w_{max}-w_{min}\bigg), 0\bigg)$$Substituting into the expected utility function gives:$$E(U_{Gacha}) = \begin{cases}v_a(1-(1-\underline{p})^{b_a}) & \text{if } \frac{v_a - z_a - w_{min}}{1+c} \lt 0 \\v_a(1-(1-\underline{p})^{b_a}) - c\bigg(\frac{v_a - z_a - w_{min}}{1+c}\bigg) & \text{if } \frac{v_a - z_a - w_{min}}{1+c} \in [0, w_{max}-w_{min}]\\v_a(1-(1-\underline{p})^{b_a}) - c(w_{max}-w_{min}) & \text{if } \frac{v_a - z_a - w_{min}}{1+c} \gt w_{max}\end{cases}$$Given that $v_a$ and $z_a$ have known distributions, the firm will set $\underline{p}$ to maximise the expected revenue. The expected revenue per player is the expected number of pulls to get exactly one success, which, under the simpler geometric distribution, is given by:$$E(\bar{n}|x) = \sum_{n=1}^{\infty} n \underline{p}(1-\underline{p})^{n-1} $$$$\Rightarrow E(\bar{n}|x) = \frac{1}{\underline{p}}$$When we condition on the budget, we must alter this slightly (by truncation at $b_a$):$$E(\bar{n}|x, b_a) = \sum_{n=1}^{b_a} n \underline{p}(1-\underline{p})^{n-1} = \frac{1}{\sum_{n=1}^{b_a}\underline{p}(1-\underline{p})^{n-1}} E(\bar{n}|x)$$$$\Rightarrow E(\bar{n}|x, b_a) = \frac{1}{\underline{p}(1-(1-\underline{p})^{b_a})}$$This is very similar and basically is just adjusted upwards as the distribution is zero beyond the budget. After substituting in for the budget, we can then conclude that the expected revenue per player is:$$E(\Pi_a) = \begin{cases}\frac{1}{\underline{p}(1-(1-\underline{p})^{z_a+w_a})} & \text{if } E(U_{Gacha})\ge 0 \\0 & \text{if } E(U_{Gacha})\lt 0\end{cases}$$Like I have already said, lowering the probability of success will increase the expected revenue per consumer, but will reduce the number of consumers willing to engage in the gacha system. To try to analyse this, we will look at the expected utility of the gacha system by the three equations above. Start with the lower bound of $w_{min}$ in-game currency, where these players typically will have lower valuations and/or high real-money budgets. They have zero need to additional in-game currecny, thus the expected utility is strictly greater than zero, as there is no cost of earning in-game currency:$$E(U_{Gacha}|\hat{w}_a=0) = v_a(1-(1-\underline{p})^{z_a+b_a}) \gt 0$$If the population was entirely these customers, then, in NE, the firm would set the probability of success equal to zero, making the consumers all have zero utility. This kind of NE is probably very familar to Game Theorists, not as it exists in other models, but that the outcome is kind of pointless (consumers are engaging in a thing for literally no gain, but it is the equilibrium). Next, let us consider the opposite end of the in-game currecny range, where the consumers max it out. These consumers typically have higher valuations of the item, and/or have low real-money budgets, and therefore will need to earn the maximum in-game currency to expect to earn the item. These consumers have a expected utilty of:$$E(U_{Gacha}|\hat{w}_a=w_{max}-w_{min}) = v_a(1-(1-\underline{p})^{z_a+w_{max}}) - c(w_{max}-w_{min})$$If the expected utility is weakly positive, then the firm will earn the expected revenue given above, but if the expected utilty is negative, then the firm earns nothing. Therefore, we can evaluate the total revenue:$$E(\Pi|\hat{w}_a=w_{max}-w_{min}) = \sum_{a \in A} \frac{1}{\underline{p}(1-(1-\underline{p})^{z_a+w_{max}})} \mathbb{1} [v_a(1-(1-\underline{p})^{z_a+w_{max}}) \ge c(w_{max}-w_{min})]$$The consumer's participation constraint can be interpreted as the expected valuation of the item (valuation multiplied by the probability of winning) must excede the cost of earning the additional in-game currency, but it can also be written as:$$(1-(1-\underline{p})^{z_a+w_{max}}) \ge \frac{c(w_{max}-w_{min})}{v_a}$$which would be interpreted as the probability of winning must exceed the ratio of the cost of earning the in-game currency to the valuation of the item. The reason to highlight this interpretation is that the probability of winning is part of the firm's revenue function. There is no analytical way to solve this (that I know of) as it all depends on the distributions of $v_a$ and $z_a$. What we can do is generate a sample of the distributions and find the expected revenue at different levels of $\underline{p}$. Luckily, we have already created this sample in the appendix, so we can use that to find the expected revenue.```{ojs}revenue_data = FileAttachment('revenue_total_data.csv').csv({typed: true})Plot.plot({ x: { label: 'Probability of Success', domain: [0, 1], }, y: { label: 'Expected Revenue', domain: [0, 50000000], ticks: 10, tickFormat: d3.format('.2s'), }, color: { legend: true, range: ['var(--plot-rule-color-1)', 'var(--plot-rule-color-2)'], domain: ['Revenue, Linear Cost', 'Revenue, Quadratic Cost'], }, marks: [ Plot.line(revenue_data, {x: 'p_underline', y: 'revenue', stroke: 'var(--plot-rule-color-1)'}), Plot.line(revenue_data, {x: 'p_underline', y: 'revenue_quad', stroke: 'var(--plot-rule-color-2)', strokeDasharray: '5,5'}), // Plot.crosshairX(revenue_data, {x: 'p_underline', y: 'revenue', textFill: 'var(--plot-rule-color-1)', textStroke: 'var(--bs-body-bg)', ruleStroke: 'var(--plot-rule-color-1)'}), Plot.frame() ], width: width})```The graph above shows that the firm should set the probability of success as low as possible to maximise revenue. But won't this cause a massive number of consumers to not participate? Well, it turns out, that the cost of earning the needed additional in-game currency is just not large enough to outweigh the expected utility of the item. ### Adding Future itemsThe reason for including the next available item, is because Gacha games often announce all limited-time items available in a patch before the patch is released, and each limited-time item is available over each half of the patch, therefore the value of both items will inform the decision making process. For example, if you kinda like the current item, but really like the next item, with a fixed budget, you are far more likely to skip the current item (or at least spend less) in order to improve you chances to get the next item. This is what gives your remaining budget utility. This means that we will not drop terms in our expected utility function:$$E(U_{Gacha}(t=1)) = v_{1a} \sum_{k=0}^{b_a-1} P(x\cap k|b_a) + u(z_a + w_a - E(\bar{n}|x, b_a)) - c(\hat{w}_a)$$Here, I have added a time index for periods 1 and 2, where the first period is the current item and the second period is the next item. I will assume that there will only be 2 items, so any remaining budget after the second item is obtained (if it is obtained) will yeild no utility, just as with the 1 period case above. This means that the expected utility function for the second period is what we have already found, but dropping the cost of earning in-game currency as this was already incurred in period 1. [3] This could be expanded to any finite number of items, but usually firms will only release information (and indeed develop the items) for a few items at a time, meaning that expected utility for items beyond this is not known. We could alternatively set this as some constant, as consumers would assume that future items do will exist, and they will get some utility form them. I don't think this would change the analysis much, but I can check later. The expected utility function for the second period is:$$E(U_{Gacha}(t=2)) = v_{2a} \sum_{k=0}^{(b_a-\Delta b_a)-1} P(x\cap k|b_a - \Delta b_a)$$Note that the budget is what remains after period 1, which will range form possibly nothing (which would be really bad luck), or possilby the entire budget if they did not try to get item 1. We can substitute this expected utility above to get the overal expected utility function:$$E(U_{Gacha}) = v_{1a} \sum_{k=0}^{b_{1a}-1} P(x\cap k|b_{1a}) + v_{2a} \sum_{k=0}^{b_{2a}-1} P(x\cap k|b_{2a}) - c(\hat{w}_a) \text{ where } b_{1a} + b_{2a} = b_a$$This gives us a pretty simple utility function: it is the sum of the expected utility of each item, given the budgets, minus the cost of earning in-game currency. As we are using the geometric distribution for the probability of success, we can simplify the above function:$$E(U_{Gacha}) = v_{1a} \underline{p}\frac{1-(1-\underline{p})^{b_{1a}}}{\underline{p}} + v_{2a} \underline{p}\frac{1-(1-\underline{p})^{b_{2a}}}{\underline{p}} - c(\hat{w}_a)$$$$\Rightarrow E(U_{Gacha}) = v_{1a}(1-(1-\underline{p})^{b_{1a}}) + v_{2a}(1-(1-\underline{p})^{b_{2a}}) - c(\hat{w}_a)$$As the budgets must sum to the total budget $(b_{2a}=b_a-b_{1a})$ we can maximise the expected utility with some easy calculus. The first order condition is:$$\frac{d E(U_{Gacha})}{d b_{1a}} = -v_{1a} \ln(1-\underline{p})(1-\underline{p})^{b_{1a}} + v_{2a} \ln(1-\underline{p})(1-\underline{p})^{b_a-b_{1a}} = 0$$Assuming that $\underline{p} \ne 0, 1$, we can simplify and rearrange:$$\Rightarrow \frac{v_{1a}}{v_{2a}} = \frac{(1-\underline{p})^{b_a-b_{1a}}}{(1-\underline{p})^{b_{1a}}} = (1-\underline{p})^{b_a-2b_{1a}} = (1-\underline{p})^{b_{2a}-b_{1a}}$$This shows that the ratio of the valuations of the items $\frac{v_{1a}}{v_{2a}}$ will be equal to the ratio of the probability of never winning the item within the item's allocated budget $(1-\underline{p})^{b_{2a}-b_{1a}}$. Basically, the higher the relative valuation, then the higher the relative budget. We can solve for the budgets:$$b_{1a} = \frac{b_a}{2} - \frac{1}{2\ln(1-\underline{p})}\ln(\frac{v_{1a}}{v_{2a}})$$Similarly:$$b_{2a} = \frac{b_a}{2} + \frac{1}{2\ln(1-\underline{p})}\ln(\frac{v_{1a}}{v_{2a}})$$We can interpret these as a divergence from a 50% split of the budget. If item 1 is prefered, then $v_{1a} \gt v_{2a}$, which implies that $b_{1a} > b_{2a}$, as $ln(\gt 1) \gt 0$ and $\ln(1-\underline{p})\lt0$, and vice versa. We can also show that the difference between the budgets is:$$b_{1a} - b_{2a} = - \frac{1}{\ln(1-\underline{p})}\ln(\frac{v_{1a}}{v_{2a}})$$This doesn't add too much, but hey maybe it's slightly interesting to look at. We can find the utility but substituting in the budgets:$$E(U_{Gacha}) = v_{1a}(1-(1-\underline{p})^{\frac{b_a}{2} - \frac{1}{2\ln(1-\underline{p})}\ln(\frac{v_{1a}}{v_{2a}})}) + v_{2a}(1-(1-\underline{p})^{\frac{b_a}{2} + \frac{1}{2\ln(1-\underline{p})}\ln(\frac{v_{1a}}{v_{2a}})}) - c(\hat{w}_a)$$$$\Rightarrow E(U_{Gacha}) = v_{1a} - v_{2a} - v_{1a}\frac{(1-\underline{p})^{\frac{b_a}{2}}}{(1-\underline{p})^{\frac{1}{2\ln(1-\underline{p})}\ln(\frac{v_{1a}}{v_{2a}})}} - v_{2a}(1-\underline{p})^{\frac{b_a}{2}}(1-\underline{p})^{\frac{1}{2\ln(1-\underline{p})}\ln(\frac{v_{1a}}{v_{2a}})} - c(\hat{w}_a)$$I'm not sure the second equation is any better than the first, but they're basically the same. At this point, I'm starting to suspect that the NE will not be very different than with 1 item. This is a similar participation contraint as the one item case, just weighted between the two items depending on the valuations. We do need to update the revenue function to take into account the two items:$$E(\Pi_a) = \begin{cases}\frac{1}{\underline{p}(1-(1-\underline{p})^{b_{1a}})} + \frac{1}{\underline{p}(1-(1-\underline{p})^{b_{2a}})} & \text{if } E(U_{Gacha})\ge 0 \\0 & \text{if } E(U_{Gacha})\lt 0\end{cases}$$Here, we have the sum of the expected revenue from item 1, conditional on the budget assigned to item 1, and the expected revenue from item 2, conditional on the budget assigned to item 2. ```{ojs}revenue_data_two_items = FileAttachment('revenue_total_data_two_items.csv').csv({typed: true})Plot.plot({ x: { label: 'Probability of Success', domain: [0, 1], }, y: { label: 'Expected Revenue', domain: [0, 25000000], ticks: 10, tickFormat: d3.format('.2s'), }, color: { legend: true, range: ['var(--plot-rule-color-1)'], domain: ['Revenue, Linear Cost'], }, marks: [ Plot.line(revenue_data_two_items, {x: 'p_underline', y: 'revenue', stroke: 'var(--plot-rule-color-1)'}), Plot.crosshairX(revenue_data_two_items, {x: 'p_underline', y: 'revenue', textFill: 'var(--plot-rule-color-1)', textStroke: 'var(--bs-body-bg)', ruleStroke: 'var(--plot-rule-color-1)'}), Plot.frame() ], width: width})```## Special Cases### 50/50 SystemHere I will talk about Gacha without Pity, and loot boxes (which are pretty similar). These tend to follow pretty simple geometric distributions, so are relatively easy to use. ### Game QualityThe idea here is that the firm can invest more into development of the game in order to create a better game (not a strict rule that more development means a better game of course, but assume a general positive correlation). A better game will then influence the demand for the items in the game, by affecting the real-money budgets of players, and, in-turn, increasing the revenue. This would increase revenue in both a fixed price system and a gacha system, by increasing the price/expected price that players pay, but I think the increase would be higher in the fixed price system (need to check). In this scenario, we should treat the player's real-money budets as an endogenous variable, dependant on the quality of the game. The idea is that the better the game is, the more enjoyment they get, and the more they would be willing to pay for items in the game. This happens quite frequently in free game where the only source of monetisation is transactions within the game (such as skins, gacha, battlepasses, etc.), where players will base their in-game purchases on how much they would have been willing to pay for the game if it wasn't free. Some players even see this as a generosity to the game producers, as they liked a game they made, want them to make more so want to fund their future projects, and some player also feel like they owe the firm some money as they enjoyed the game and therefore, the firm deserves some compensation.## Conclusion## References::: {#refs}:::## Appendix### SimulationTo illustrate the pity system in gacha, we can simulate the process of pulling 10,000 times, and record the number of times each rarity of item is obtained.The simulation can be played, paused, and restarted using the buttons below, as well as manually sliding the bar to view the results at a specific pull number.```{python}import randomimport jsonimport numpy as npimport pandas as pdpity_lists = pd.read_csv('simple_pity_lists.csv')rare_pity_list = pity_lists['Rare'].tolist()legendary_pity_list = pity_lists['Legendary'].tolist()rarity_counts = pd.read_csv('simple_rarity_count_lists.csv')Common_count_list = rarity_counts['Common'].tolist()Rare_count_list = rarity_counts['Rare'].tolist()Legendary_count_list = rarity_counts['Legendary'].tolist()obtained_pity_lists = pd.read_csv('simple_obtained_pity_lists.csv')rare_obtained_pity_list = obtained_pity_lists['Rare'].tolist()legendary_obtained_pity_list = obtained_pity_lists['Legendary'].tolist()legendary_obtained_pity_list = [int(x) for x in legendary_obtained_pity_list ifstr(x) !='nan']pull_totals = json.loads(open('simple_pull_totals.json').read())legendary_pity_sims = pd.read_csv('legendary_pity_sim_stats.csv')legendary_pity_sims_25 = legendary_pity_sims['25th Percentile'].tolist()legendary_pity_sims_50 = legendary_pity_sims['50th Percentile'].tolist()legendary_pity_sims_75 = legendary_pity_sims['75th Percentile'].tolist()legendary_pity_sims_mean = legendary_pity_sims['Mean'].tolist()legendary_pity_sims_mode = legendary_pity_sims['Mode'].tolist()ojs_define(Common_count_list = Common_count_list)ojs_define(Rare_count_list = Rare_count_list)ojs_define(Legendary_count_list = Legendary_count_list)ojs_define(rare_pity_list = rare_pity_list)ojs_define(legendary_pity_list = legendary_pity_list)ojs_define(legendary_obtained_pity_list = legendary_obtained_pity_list)``````{ojs}simple_simulation_data = FileAttachment('simple_simulation_data.csv').csv({typed: true})simulation_data = []empty_data = []filter_sim_data = { for (var i=0; i<simple_simulation_data.length; i++){ simulation_data.push({'pull': simple_simulation_data[i].pull, 'x': simple_simulation_data[i].x, 'y': simple_simulation_data[i].y, 'rarity': simple_simulation_data[i].rarity}) empty_data.push({'pull': simple_simulation_data[i].pull, 'x': simple_simulation_data[i].x, 'y': simple_simulation_data[i].y, 'rarity': ''}) }}``````{ojs}Plot.plot({ padding: 0, grid: true, x: { axis: 'top', label: 'Pulls', ticks: d3.ticks(0, 100, 10) }, y: { label: "100's of pulls", ticks: d3.ticks(0, 100, 10) }, color: { legend: true, domain: ['Common', 'Rare', 'Legendary', ''], range: ['cornflowerblue', 'blueviolet', 'gold', 'white'], }, marks: [ Plot.cell(simulation_data.slice(0, pull_number).concat(empty_data.slice(pull_number, 10000)), {x: 'x', y: 'y', fill: 'rarity', stroke: 'black', strokeWidth: 0.5}), Plot.frame() ], width: width, height: width})```<p style="text-align: center; font-weight: bold">Moving Totals</p><div style="display: inline-flex"><div style="margin-right: 5px"><table> <tr> <td colspan=3 style="text-align: center">Pull Totals</td> </tr> <tr> <td style="width: 100px; padding-left: 5px; background-color: cornflowerblue;">Common</td> <td style="width: 100px; padding-left: 5px; background-color: blueviolet; color: white;">Rare</td> <td style="width: 100px; padding-left: 5px; background-color: gold;">Legendary</td> </tr> <tr> <td style="padding-left: 5px"> ${Common_count_list[pull_number-1]} </td> <td style="padding-left: 5px"> ${Rare_count_list[pull_number-1]} </td> <td style="padding-left: 5px"> ${Legendary_count_list[pull_number-1]} </td> </tr></table></div><div style="margin-right: 5px"><table> <tr> <td colspan=2 style="text-align: center">Pity Counters</td> </tr> <tr> <td style="width: 100px; padding-left: 5px; background-color: blueviolet; color: white;">Rare</td> <td style="width: 100px; padding-left: 5px; background-color: gold;">Legendary</td> </tr> <tr> <td style="padding-left: 5px"> ${rare_pity_list[pull_number]} </td> <td style="padding-left: 5px"> ${legendary_pity_list[pull_number]} </td> </tr></table></div><div><div style="display: inline-flex; padding-left: 193px">```{ojs}viewof simulation_restart = html`<form class="Restart_Simulation">${Object.assign(html`<button type=button><i class="bi bi-skip-start"></i>`, {onclick: event => event.currentTarget.dispatchEvent(new CustomEvent("input", {bubbles: true}))})}```````{ojs}viewof simulation_play = html`<form class="Play_Simulation" id="sim-play">${Object.assign(html`<button type=button><i class="bi bi-play"></i>`, {onclick: event => event.currentTarget.dispatchEvent(play())})}```````{ojs}viewof simulation_pause = html`<form class="Pause_Simulation" id="sim-pause" style="display: none">${Object.assign(html`<button type=button onclick="clearInterval(play)"><i class="bi bi-pause"></i>`, {onclick: event => event.currentTarget.dispatchEvent(pause())})}```````{ojs}viewof simulation_end = html`<form class="End_Simulation">${Object.assign(html`<button type=button><i class="bi bi-skip-end"></i>`, {onclick: event => event.currentTarget.dispatchEvent(new CustomEvent("input", {bubbles: true}))})}````</div>```{ojs}viewof pull_number = Inputs.range( [1, 10000], { step: 1, value: 1, label: 'Number of pulls', id: 'pull_number'})```</div></div>```{ojs}restart = { simulation_restart; const pull_number = document.getElementById('oi-3a86ea-1'); pull_number.value = 1 pull_number.dispatchEvent(new CustomEvent("input", {bubbles: true}))}end = { simulation_end; const pull_number = document.getElementById('oi-3a86ea-1'); pull_number.value = 10000 pull_number.dispatchEvent(new CustomEvent("input", {bubbles: true}))}```<script>let nIntervId;functionplay() {const play =document.getElementById('sim-play');const pause =document.getElementById('sim-pause'); play.style.display='none' pause.style.display='inline-block' nIntervId =setInterval(start_sim,67)}functionstart_sim() {const pull_number =document.getElementById('oi-3a86ea-1');const play =document.getElementById('sim-play');const pause =document.getElementById('sim-pause');if (pull_number.value==10000){clearInterval(nIntervId) play.style.display='inline-block' pause.style.display='none' } else { pull_number.value=parseInt(pull_number.value) +1 pull_number.dispatchEvent(newCustomEvent("input", {bubbles:true})) }}functionpause() {const play =document.getElementById('sim-play');const pause =document.getElementById('sim-pause'); play.style.display='inline-block' pause.style.display='none'clearInterval(nIntervId) nIntervId =null}</script><p style="text-align: center; font-weight: bold">Statistics</p><div style="display: inline-flex"><div style="margin-right: 5px"><table> <tr> <td colspan=3 style="text-align: center">Pull Totals</td> </tr> <tr> <td style="width: 100px; padding-left: 5px; background-color: cornflowerblue;">Common</td> <td style="width: 100px; padding-left: 5px; background-color: blueviolet; color: white;">Rare</td> <td style="width: 100px; padding-left: 5px; background-color: gold;">Legendary</td> </tr> <tr> <td style="padding-left: 5px">`{python} pull_totals['Common']`</td> <td style="padding-left: 5px">`{python} pull_totals['Rare']`</td> <td style="padding-left: 5px">`{python} pull_totals['Legendary']`</td> </tr></table></div><div style="margin-right: 5px"><table> <tr> <td colspan=2 style="text-align: center">Min Pity</td> </tr> <tr> <td style="width: 100px; padding-left: 5px; background-color: blueviolet; color: white;">Rare</td> <td style="width: 100px; padding-left: 5px; background-color: gold;">Legendary</td> </tr> <tr> <td style="padding-left: 5px">`{python} np.min(rare_obtained_pity_list)`</td> <td style="padding-left: 5px">`{python} np.min(legendary_obtained_pity_list)`</td> </tr></table></div><div style="margin-right: 5px"><table> <tr> <td colspan=2 style="text-align: center">Max Pity</td> </tr> <tr> <td style="width: 100px; padding-left: 5px; background-color: blueviolet; color: white;">Rare</td> <td style="width: 100px; padding-left: 5px; background-color: gold;">Legendary</td> </tr> <tr> <td style="padding-left: 5px">`{python} np.max(rare_obtained_pity_list)`</td> <td style="padding-left: 5px">`{python} np.max(legendary_obtained_pity_list)`</td> </tr></table></div><div><table> <tr> <td colspan=2 style="text-align: center">Average Pity</td> </tr> <tr> <td style="width: 100px; padding-left: 5px; background-color: blueviolet; color: white;">Rare</td> <td style="width: 100px; padding-left: 5px; background-color: gold;">Legendary</td> </tr> <tr> <td style="padding-left: 5px">`{python} round(np.mean(rare_obtained_pity_list), 2)`</td> <td style="padding-left: 5px">`{python} round(np.mean(legendary_obtained_pity_list), 2)`</td> </tr></table></div></div>I have also included some statistics from this simulation above. The first table shows the total number for each rarity as a moving total as the number of pulls increase. It also shows how the pity changes over time, to illustrate how the pity system works. The second table shows final totals, as well as the minimum, maximum, and average number of pulls required to obtain a rare and legendary item. In addition, the distribution of the pity for legendary items is graphed in @fig-simulation-pity-legendary, against the predicted counts based on @fig-joint-density. ```{ojs}//| label: fig-simulation-pity-legendary//| fig-cap: 'Pity Distribution for Legendary Items'{ let Pity_data = [] let Pity_predicted = [] let prob_x_k = 0 let pity_list = legendary_obtained_pity_list for (var i=0; i<=99; i++){ let count = pity_list.filter(x => x == i).length Pity_data.push({'Pity': i, 'Count': count}) } for (var k=0; k<=99; k++){ if (k<74) { prob_x_k = 0.01*(0.99**k) Pity_predicted.push({'Pity': k, 'Count': prob_x_k*Legendary_count_list[9999]}) } else { let prob_failure_k_74 = 0.99**73 for (var i=74; i<=k; i++){ prob_failure_k_74 *= (0.99-0.99*(i-74)/26) } prob_x_k = (0.01+0.99*(k-73)/26)*prob_failure_k_74 Pity_predicted.push({'Pity': k, 'Count': prob_x_k*Legendary_count_list[9999]}) } } const plot = Plot.plot({ x: { label: 'Pity', domain: [-2, 102], ticks: d3.ticks(0, 100, 19) }, y: { label: 'Count', domain: [0, 14] }, color: { legend: true, domain: ['Simulation Data', 'Prediction'], range: ['var(--plot-rule-color-1)', 'var(--plot-rule-color-2)'], }, marks: [ Plot.ruleX(Pity_data, {x: 'Pity', y: 'Count', stroke: 'var(--plot-rule-color-1)', strokeWidth: 7}), Plot.line(Pity_predicted, {x: 'Pity', y: 'Count', stroke: 'var(--plot-rule-color-2)'}), Plot.tip(Pity_data, Plot.pointerX({x: 'Pity', y: 'Count', fill: "var(--bs-body-bg)"})), Plot.frame(), ], width: width, }) return plot}```The below table also shows legendary pity statistics from the simulation and compares them to the predictions. ::: {#tbl-simulation-pity}<table style="width: 100%"> <tr style="border-bottom: 1px solid black"> <th></th> <th>Mean Pity</th> <th>Modal Pity</th> <th>25% Percentile Pity</th> <th>Median Pity</th> <th>75% Percentile Pity</th> </tr> <tr> <td>Simulation</td> <td>`{python} round(np.mean(legendary_obtained_pity_list), 2)`</td> <td>`{python} np.argmax(np.bincount(legendary_obtained_pity_list))`</td> <td>`{python} int(np.percentile(legendary_obtained_pity_list, 25))`</td> <td>`{python} int(np.median(legendary_obtained_pity_list))`</td> <td>`{python} int(np.percentile(legendary_obtained_pity_list, 75))`</td> </tr> <tr> <td>Prediction</td> <td>`{python} round(exp_n_x, 2)`</td> <td>77</td> <td>28</td> <td>68</td> <td>77</td> </tr></table>Descriptive Statistics for Legendary Pity from Simulation and Prediction:::The simulated data and predictions are very well matched, with the exception of the 25% percentile, which is in the middle of the region which has the most variance due to such a low probability of success[^1]. Looking at the simulated data, there is fewer successes between 0 and 33 than between 34 and the median of 67, which is why the 25% percentile is 10 pity higher in the simulation than the prediction. [^1]: Not 100% about this, but my gut tells me that the low probability of these events causes very large variance, and can lead to the 25% percentile being more volatile than other percentiles. In fact, this carries through for any percentile below the median, which is itself very close to the soft pity, and the higher likelihood events, and therefore less volatile. To test if this 25% percentile difference is a one-off or a consistent bias, I will run the simulation 100 times and compare the 25% percentile of the pity for each run to the prediction. The distribution for the 25% percentile can be seen in @fig-simulation-pity-by-percentile.```{ojs}legendary_pity_sims = FileAttachment('legendary_pity_sim_stats.csv').csv({typed:true})data_with_50_50 = FileAttachment('50-50-graph-data-1.csv').csv({typed:true}) ``````{ojs}//| label: fig-simulation-pity-by-percentile//| fig-cap: 'Pity Distributions for Legendary Items'{ let percentile_data_25 = [] let percentile_data_50 = [] let percentile_data_75 = [] let Pity_data = [] for (var i=0; i<legendary_pity_sims.length; i++){ percentile_data_25.push(legendary_pity_sims[i]['25th Percentile']) percentile_data_50.push(legendary_pity_sims[i]['50th Percentile']) percentile_data_75.push(legendary_pity_sims[i]['75th Percentile']) } for (var i=0; i<=99; i++){ let count_25 = percentile_data_25.filter(x => x == i).length if (count_25>0) { Pity_data.push({'Pity': i, 'Count': count_25, 'Percentile': '25', color:'var(--plot-rule-color-1)'}) } let count_50 = percentile_data_50.filter(x => x == i).length if (count_50>0) { Pity_data.push({'Pity': i, 'Count': count_50, 'Percentile': '50', color:'var(--plot-rule-color-2)'}) } let count_75 = percentile_data_75.filter(x => x == i).length if (count_75>0) { Pity_data.push({'Pity': i, 'Count': count_75, 'Percentile': '75', color:'var(--plot-rule-color-3)'}) } } const plot = Plot.plot({ x: { label: 'Pity', domain: [-2, 102], }, y: { label: 'Count', domain: [0, 65] }, color: { legend: true, domain: ['25', '50', '75'], range: ['var(--plot-rule-color-1)', 'var(--plot-rule-color-2)', 'var(--plot-rule-color-3)'], label: 'Percentile', }, marks: [ Plot.ruleX(Pity_data, {y: 'Count', x: 'Pity', stroke: 'color', strokeWidth: 7}), Plot.tip(Pity_data, Plot.pointerX({x: 'Pity', y: 'Count', fill: "var(--bs-body-bg)"})), Plot.frame(), ], width: width, }) return plot}```One can see that the range of values for the 25th percentile includes both the expected value (28), towards the middle, and the value from the simulation (38) towards the tail. This suggests that it was indeed an outlier, and that the simulation is consistent with the predictions. Further, the variance among the 25th percentile and median is larger than the 75th percentile as hypothesised. One can explain this by pointing to the variance of the geometrically distributed section of the probability distribution, of which the 25th percentile and median lie on. If this geometric distribution was continued to infinity (as it properly would be), the variance would be $\frac{1-p}{p^2}$, which is larger for small probabilities. This gives reason for the larger variance in the 25th percentile and median, and the smaller variance in the 75th percentile, if not mathematically exact, or rigorous.### In-Game Currency SimulationBelow is the histograms for the in-game currency distribution, simulated for 10000 players. It shows both the histogram for no minimum or maximum values for the in-game currency, and a restricted version which is the one used in the model. Given that:$$w_a = w_{min} + \hat{w}_a = w_{min} + \frac{v_a - z_a - w_{min}}{1+c} = \frac{v_a - z_a + cw_{min}}{1+c}$$We can see that the amount of in-game currency earned will:- Have a lower bound at $w_{min}$- Increasing in $w_{min}$, but at a rate of $\frac{c}{1+c}$, which is less than 1- Increase with the valuation of the item, $v_a$- Decrease with the real-money budget, $z_a$- Decrease with the cost of earning the in-game currency, $c$The unrestricted ($w_{min}=0$ & $w_{max}=\infty$) in-game currency equation used is:$$w_{unrestr} = \frac{v_a - z_a}{1+c}$$The restricted ($w_{min}<w_{max}$ & $w_{min},w_{max} \in \mathbb{R}_{++}$) in-game currency equation used is:$$w = \begin{cases} w_{min} & \text{if } v_a - z_a < w_{min} \\ w_{min} + \frac{v_a - z_a - w_{min}}{1+c} & \text{if } w_{min} \le w_{min} + \frac{v_a - z_a - w_{min}}{1+c} \le w_{max} \\ w_{max} & \text{if } w_{min} + \frac{v_a - z_a - w_{min}}{1+c} > w_{max}\end{cases}$$```{ojs}//| label: in-game-currency-dist-sim//| fig-cap: 'In-Game Currency Distribution Simulation'w_a_data = FileAttachment('in-game-currency-dist-sim.csv').csv({typed:true})w_a_hist_data = FileAttachment('in-game-currency-dist-sim-hist.csv').csv({typed:true}){ const plot = Plot.plot({ x: { label: 'In-Game Currency, w', domain: [0, 100], }, y: { label: 'Density', domain: [0, 0.35], }, color: { legend: true, domain: ['w', 'w-unrestricted'], range: ['var(--plot-rule-color-1)', 'var(--plot-rule-color-2)'] }, marks: [ Plot.rectY(w_a_data, Plot.binX({y: 'proportion'}, {x: 'w', thresholds: 50, fill: 'var(--plot-rule-color-1)'})), Plot.rectY(w_a_data, Plot.binX({y: 'proportion'}, {x: 'w_unrestr', thresholds: 120, fill: 'var(--plot-rule-color-2)', opacity: 0.5})), Plot.text([[20.5, 0.29]], {text: ['0.280'], font: 'var(--bs-small)', color: 'var(--bs-body-color)'}), Plot.text([[70.5, 0.203]], {text: ['0.193'], font: 'var(--bs-small)', color: 'var(--bs-body-color)'}), Plot.text([[95, 0.33]], {text: ['N=10,000'], font: 'var(--bs-small)', color: 'var(--bs-body-color)'}), Plot.tip(w_a_hist_data, Plot.pointerX({x: 'bin_mid', y: 'w_unrestr', fill: "var(--bs-body-bg)", title: (d) => `In-Game Currency, w ${d.bin} \n Density (Unrestr) ${d.w_unrestr} \n Density (Restr): ${d.w}`, font: 'var(--bs-small)'})), Plot.frame() ], width: width }) return plot}```Below is a similar histogram for the in-game currency distribution but for the quadtratic cost function. The quadratic cost function is given by:$$cost = c\hat{w}_a^2 \text{ where } c=0.01$$ONe can see that the quadratic cost means that many fewer players will be willing to earn the maximum level of in-game currency, and therefore the density from the maximum level is shifted down into the lower levels. The unrestricted model also shows the upper levels of in-game currency being less desired and increased density in lower levels. Both seems to have some concavity (at least locally) in the distribution, instead of the uniformness seen with linear cost function. ```{ojs}//| label: in-game-currency-dist-sim-quadratic//| fig-cap: 'In-Game Currency Distribution Simulation with Quadratic Cost Function'{ const plot = Plot.plot({ x: { label: 'In-Game Currency, w', domain: [0, 100], }, y: { label: 'Density', domain: [0, 0.35], }, color: { legend: true, domain: ['w-quad', 'w-quad-unrestricted'], range: ['var(--plot-rule-color-1)', 'var(--plot-rule-color-2)'] }, marks: [ Plot.rectY(w_a_data, Plot.binX({y: 'proportion'}, {x: 'w_quad', thresholds: 50, fill: 'var(--plot-rule-color-1)'})), Plot.rectY(w_a_data, Plot.binX({y: 'proportion'}, {x: 'w_quad_unrestr', thresholds: 70, fill: 'var(--plot-rule-color-2)', opacity: 0.5})), Plot.text([[20.5, 0.289]], {text: ['0.279'], font: 'var(--bs-small)', color: 'var(--bs-body-color)'}), Plot.text([[70.5, 0.029]], {text: ['0.019'], font: 'var(--bs-small)', color: 'var(--bs-body-color)'}), Plot.text([[95, 0.33]], {text: ['N=10,000'], font: 'var(--bs-small)', color: 'var(--bs-body-color)'}), Plot.tip(w_a_hist_data, Plot.pointerX({x: 'bin_mid', y: 'w_quad_unrestr', fill: "var(--bs-body-bg)", title: (d) => `In-Game Currency, w ${d.bin} \n Density (Unrestr) ${d.w_quad_unrestr} \n Density (Restr): ${d.w_quad}`, font: 'var(--bs-small)'})), Plot.frame() ], width: width }) return plot}```Below is the histogram for the full budget distribution, simulated for 10000 players. It shows the histogram for the full budget distribution, which is the sum of the in-game currency and the real-money budget, minus the costs of earning the in-game currency. The linear cost was used here due to simpliciity and the fact that the convex costs are not too dissimilar. ```{ojs}//| label: full-budget-dist-sim//| fig-cap: 'Full Budget Distribution Simulation'{ const plot = Plot.plot({ x: { label: 'Number of pulls', domain: [0, 110], }, y: { label: 'Density', domain: [0, 0.1], }, color: { legend: true, domain: ['b', 'b_unrestr'], range: ['var(--plot-rule-color-1)', 'var(--plot-rule-color-4)'] }, marks: [ Plot.rectY(w_a_data, Plot.binX({y: 'proportion'}, {x: 'b', thresholds: 120, fill: 'var(--plot-rule-color-1)', tip: {x: 'b', y: 'proportion', fill: "var(--bs-body-bg)", anchor: 'bottom-right'}})), Plot.rectY(w_a_data, Plot.binX({y: 'proportion'}, {x: 'b_unrestr', thresholds: 120, fill: 'var(--plot-rule-color-4)', tip: {x: 'b_unrestr', y: 'proportion', fill: "var(--bs-body-bg)", anchor: 'bottom-left'}, opacity: 0.5})), Plot.text([[105, 0.094]], {text: ['N=10,000'], font: 'var(--bs-small)', color: 'var(--bs-body-color)'}), Plot.frame() ], width: width }) return plot}{ const plot = Plot.plot({ x: { label: 'Number of pulls', domain: [0, 110], }, y: { label: 'Density', domain: [0, 0.4], }, color: { legend: true, domain: ['w', 'z'], range: ['var(--plot-rule-color-2)', 'var(--plot-rule-color-3)'] }, marks: [ Plot.rectY(w_a_data, Plot.binX({y: 'proportion'}, {x: 'w', thresholds: 50, fill: 'var(--plot-rule-color-2)', tip: {x: 'w', y: 'proportion', fill: "var(--bs-body-bg)", anchor: 'bottom-right'}})), Plot.rectY(w_a_data, Plot.binX({y: 'proportion'}, {x: 'z', thresholds: 100, fill: 'var(--plot-rule-color-3)', tip: {x: 'z', y: 'proportion', fill: "var(--bs-body-bg)"}})), Plot.text([[105, 0.38]], {text: ['N=10,000'], font: 'var(--bs-small)', color: 'var(--bs-body-color)'}), Plot.frame() ], width: width }) return plot}{ const plot = Plot.plot({ x: { label: 'Number of pulls', domain: [0, 110], }, y: { label: 'Density', domain: [0, 0.4], }, color: { legend: true, domain: ['v', 'w_cost'], range: ['var(--plot-rule-color-5)', 'var(--plot-rule-color-6)'] }, marks: [ Plot.rectY(w_a_data, Plot.binX({y: 'proportion'}, {x: 'w_cost', thresholds: 6, fill: 'var(--plot-rule-color-6)', tip: {x: 'w_cost', y: 'proportion', fill: "var(--bs-body-bg)", anchor: 'bottom-left'}})), Plot.rectY(w_a_data, Plot.binX({y: 'proportion'}, {x: 'v', thresholds: 100, fill: 'var(--plot-rule-color-5)', tip: {x: 'v', y: 'proportion', fill: "var(--bs-body-bg)"}})), Plot.text([[105, 0.38]], {text: ['N=10,000'], font: 'var(--bs-small)', color: 'var(--bs-body-color)'}), Plot.frame() ], width: width }) return plot}```The function used to generate this distribution is:$$b_a = z_a + w_{min} + \hat{w}_{a} = v_a + c\hat{w}_{a}$$Substituting in the in-game currency equation we found for just the budget ($b_a=z_a+w_a$), we get:$$b_a = z_a + w_{min} + \frac{v_a - z_a - w_{min}}{1+c} = \frac{v_a + c(z_a + w_{min})}{1+c}$$We use the restricted in-game currency, therefore the budget must be weakly larger than $w_{min}=20$. The budget can be larger than $w_{max}=70$, due to real-money budgets, and this implies that any excess over $w_{max}$ is due to real money. We cannot, though, parse the exact breakdown between in-game currency and real money.